Advanced Spark for Developers
The Advanced Spark for Developers Course will help trainees get a proper understanding of the internal structure and functioning of Apache Spark – Spark Core (RDD), Spark SQL and Spark Streaming.
This training course delivers key concepts and methods for data processing applications development using Apache Spark.
To be determined
We’ll look at the Spark framework for automated distributed code execution, and companion projects in the Map-Reduce paradigm. We’ll work with RDD, DataFrame, DataSet and describe logic with Spark SQL and DSL. As well, we’ll talk about loading data from/to external storages such as Cassandra, Kafka, Postgres, and S3. We will also work with HDFS and data formats.
During the training participants will:
After the training, participants will be able to build a simple PySpark application and execute it on the cluster in parallel mode.
Basic Java, Python, Scala programming skills. Unix/Linux shell familiarity. Experience with databases is optional.