Apache Spark Fundamentals

This training course delivers key concepts and methods for data processing applications development using Apache Spark.

  • duration 26 hours
  • Language English
  • format Online
duration
26 hours
location
Online
Language
English
Code
EAS-017
price
€ 700 *

Available sessions

To be determined



Training for 7-8 or more people?
Customize trainings for your specific needs

Description

We’ll look at the Spark framework for automated distributed code execution, and companion projects in the Map-Reduce paradigm. We’ll work with RDD, DataFrame, DataSet and describe logic with Spark SQL and DSL. As well, we’ll talk about loading data from/to external storages such as Cassandra, Kafka, Postgres, and S3. We will also work with HDFS and data formats.

After completing the course, a certificate is issued on the Luxoft Training form

Objectives

During the training participants will:

  1. Write a Spark pipeline via functional Python and RDDs; 
  2. Write a Spark pipeline via Python, Spark DSL, Spark SQL and DataFrame; 
  3. Draw architecture with different sources; 
  4. Write a Spark pipeline with external systems (Kafka, Cassandra, Postgres) which works in parallel modes; 
  5. Resolve problems with slow joins. 

After the training, participants will be able to build a simple PySpark application and execute it on the cluster in parallel mode.


Target Audience

  • Software developers
  • Software architects

Prerequisites

Basic Java, Python, Scala programming skills. Unix/Linux shell familiarity. Experience with databases is optional.


Roadmap

  • Spark concepts and architecture
  • Programming with RDDs: transformations and actions
  • Using key/value pairs
  • Loading and storing data
  • Accumulators and broadcast variables
  • Spark SQL, DataFrames, Datasets
  • Spark Streaming
  • Machine Learning using MLLib and Spark ML
  • Graph analysis using GraphX


Related courses

You may also be interested in

Discover more about professional growth and skills development

contact us