Spark and ScalaCourse Duration: 25 Hr

Selfpaced Tech is the leader in Spark and Scala online training courses. We provide quality of online training and corporate training courses by real time faculty and well trained software specialists. Our Spark and Scala online training is regarded as the best training by students who attended Spark and Scala online training with us. All our students were happy and able to find Jobs quickly in India, Singapore, Japan, Europe, Canada, Australia, USA and UK. We provide Spark and Scala online training in India, UK, USA, Singapore and Canada etc..

Rating: 4/5

Course Description

Learn the fundamentals of Spark, the technology that is revolutionizing the analytics and big data world! Spark is an open source processing engine built around speed, ease of use, and analytics. If you have large amounts of data that requires low latency processing that a typical MapReduce program cannot provide, Spark is the way to go.

Course Price: $ 500 $ 450


Live online instructor led sessions by industry veterans. Industry renowed training to boost your resume.
Incredible practicals, workshops, labs, quiz and assignments. Personalized one to one career discussion with the trainer.
Real life case studies and live project to solve real problem Mock interview & resume preparation to excel in interviews
Lifetime access to course, recorded sessions and study materials Premium job assistance and support to step ahead in career


  • Some experience coding in Python, Java, or Scala, plus some familiarity with Big Data issues/concepts.
  • Should have strong Java knowledge to develop Scala applications.


  • How Spark fits in Big Data ecosystem
  • Why Spark & Hadoop fit together
  • Driver Program
    • Spark Context
  • Cluster Manager
  • Worker
    • Executor
    • Task
  • Spark RDD
    • Spark Context
  • Spark Libraries
  • Different data sources and formats
    • HDFS
    • Amazon S3
    • Local File System
    • Text
    • JSON
    • CSV
    • Sequence File
  • Create & Use RDD, Data Frames
  • Transformation
  • Actions
  • Cache Intermediate RDD
    • Lineage Graph
    • Lazy Evaluation
  • Create Data Frame
  • Spark Interactive shell (Scala & Python)
  • Spark SQL
  • Define different ways to run your application
  • Spark Program Life Cycle
  • Function of Spark Context
  • Different Way to Launch Spark Application
    • Local
    • Standalone
    • Hadoop YARN
    • Apache Mesos
  • Launch Spark Application
    • Spark-Submit
    • Monitor the Spark Job
  • Key-Value pair
  • Apache Spark vs Apache Hadoop Map Reduce
  • Create RDD from existing non-pair RDD
  • Create pair RDD by loading certain formats
  • Create pair RDD from in-memory collection of pairs
  • Group ByKey
  • Reduce ByKey
  • Other Transformations
    • Joins
  • RDD Partition
  • Types of Partition
    • Hash Partitioning
    • Range Partitioning
  • Benefit of Partitioning
  • Best Practices
  • Explore Data in DataFrames
  • Create UDFs (user define functions)
    • UDF with Scala DSL
    • UDF with SQL
  • Repartition Data Frames.
  • Infer Schema by Reflection
  • DataFrame from database table
  • DataFrame from JSON
  • Spark Execution Model
  • Debug and Tune Spark Applications
  • Spark SQL
  • Spark Streaming
  • Spark MLib
  • Spark GraphX
  • Benefits of Apache Spark over Hadoop Ecosystem
  • Spark Streaming Architecture
  • Dstream and a spark streaming application
    • Define Use Case (Time Series Data)
    • Basic Steps
    • Save Data to HBase
  • Operations on DStream
    • Transformations
    • Data Frame and SQL Operations
  • Define Windowed Operation
    • Sliding Window
    • Windowed Computation
    • Window based Transformation
    • Window Operations
  • Fault tolerance of streaming applications
    • Fault Tolerance in Spark Streaming
    • Fault Tolerance in Spark RDD
    • Check pointing
  • Describe Graph X
  • Create a Property Graph
  • Perform Operations on Graphs
  • Describe Apache Spark MLib
  • Classifications
  • Clustering
  • Collaborative Filtering
  • Use Collaborative filtering to predict user choice
  • Introduction
  • A first example
  • Expressions and Simple Functions
  • First Class function
  • Classes and Objects
  • Case classes and Pattern matching
  • Generic types and methods
  • Lists
  • For- Comprehension
  • Mutable State
  • Computing with Streams
  • Lazy Values
  • Implicit Parameters and Conversions
  • Handley / Milner type Interface
  • Abstraction for concurrency


The tools you’ll need to attend training are:
  • Windows: Windows XP SP3 or higher
  • Mac: OSX 10.6 or higher
  • Internet speed: Preferably 512 Kbps or higher
  • Headset, speakers and microphone: You’ll need headphones or speakers to hear instruction clearly, as well as a microphone to talk to others. You can use a headset with a built-in microphone, or separate speakers and microphone.
The trainings are delivered by highly qualified and certified instructors with relevant industry experience.

People from various domains with no prior knowledge of this technology have got successfully trained with us and are now working in the this industry. Though, knowledge of basics is an added advantage.

Get your batch scheduled at your convenient time. Schedule Now