Databricks fundamentals
This Databricks Fundamentals course will help participants in getting a proper understanding of the internal structure and functioning of Databricks, the most powerful big data processing tool.
This training course delivers key concepts and methods for data processing applications development using Apache Spark.
To be determined
We’ll look at the Spark framework for automated distributed code execution, and companion projects in the Map-Reduce paradigm. We’ll work with RDD, DataFrame, DataSet and describe logic with Spark SQL and DSL. As well, we’ll talk about loading data from/to external storages such as Cassandra, Kafka, Postgres, and S3. We will also work with HDFS and data formats.
During the training participants will:
After the training, participants will be able to build a simple PySpark application and execute it on the cluster in parallel mode.
Basic Java, Python, Scala programming skills. Unix/Linux shell familiarity. Experience with databases is optional.