Apache Spark Fundamentals
Duration
26
hours
Location
Online
Language
English
Code
EAS-017
Training for 7-8 or more people?
Customize trainings
for your specific needs
Description
After completing the course, a certificate
is issued on the Luxoft Training form
is issued on the Luxoft Training form
Objectives
During the training participants will:
- Write a Spark pipeline via functional Python and RDDs;
- Write a Spark pipeline via Python, Spark DSL, Spark SQL and DataFrame;
- Draw architecture with different sources;
- Write a Spark pipeline with external systems (Kafka, Cassandra, Postgres) which works in parallel modes;
- Resolve problems with slow joins.
After the training, participants will be able to build a simple PySpark application and execute it on the cluster in parallel mode.
Target Audience
- Software developers
- Software architects
Prerequisites
Basic Java, Python, Scala programming skills. Unix/Linux shell familiarity. Experience with databases is optional.
Roadmap
- Spark concepts and architecture
- Programming with RDDs: transformations and actions
- Using key/value pairs
- Loading and storing data
- Accumulators and broadcast variables
- Spark SQL, DataFrames, Datasets
- Spark Streaming
- Machine Learning using MLLib and Spark ML
- Graph analysis using GraphX