Databricks fundamentals

This Databricks Fundamentals course will help participants in getting a proper understanding of the internal structure and functioning of Databricks, the most powerful big data processing tool.

  • duration 28 hours
  • Language English
  • format Online
duration
28 hours
location
Online
Language
English
Code
EAS-028
price
€ 700 *

Available sessions

To be determined



Training for 7-8 or more people?
Customize trainings for your specific needs

Description

Databricks is an increasingly popular platform for big data processing and analysis. Our Databricks Fundamentals course is a great way to start if you want to improve your skills in this area. You will acquire practical experience with important Databricks tools and ideas over the course of several modules, including writing queries in Scala, Python, and SQL, using Delta Lake / Parquet, and working with Notebooks.

 

One of the primary goals of the course is to make you more comfortable when using Notebook, the web-based interface for data analysis and collaboration for Databricks. With guidance from our trainer, you’ll learn how to efficiently build, manage, and share notebooks, allowing you to deal with complex data challenges.

 

Another important topic we will cover is the open-source engine, Spark, that powers Databricks data processing capabilities. You will gain a deep understanding of Spark’s internal architecture, as here we can mention RDD (Resilient Distributed Datasets) which according to databricks.com “is an immutable distributed collection of elements of your data, partitioned across nodes in your cluster, that can be operated in parallel with high level API that offers transformations and actions."

 

In order to make the right decisions on the project and avoid architectural errors, you’ll discover the differences between Delta Lake and Parquet, two file types used by Databricks to store data. Understanding the particularities of these formats will help you select the best one for your project, leading to more efficient results. We will also cover one of the key topics for any big data environment, which is query writing. You'll learn how to write queries in Scala and SQL, giving you the flexibility to work with different languages and tools as needed.

  

You will learn how to optimize your Databricks workflows for maximum performance and also learn how to use powerful visualization tools to gain valuable insights - in order to drive better decisions for the project. Overall, the Databricks Fundamentals course is a detailed practical introduction to this big data tool. With guidance from our trainer, who is an experienced Data Engineer, you’ll be able to develop the abilities and confidence to successfully handle the most complex data tasks.

After completing the course, a certificate is issued on the Luxoft Training form

Objectives

  • Practice working with Notebook
  • Understand Spark internal structures
  • Ascertain the differences between Delta Lake vs Parquet
  • Write query in Scala, Python, & SQL
  • Learn about optimization in Databricks
  • Explore Data deeply with Databricks

Target Audience

Developers, Architects

Prerequisites

Development experience in Scala, Java, Python, & SQL - 3 months.


Roadmap

Introduction to Databricks – Theory 60% / Practice 40% - 4h

Creating Databricks Service

Databricks RI Overview

Databricks Architecture Overview

Databricks Notebooks

Databricks Cluster and Jobs - Theory 60% / Practice 40% - 4h

Cluster types and configuration

Databricks cluster pool

Databricks Job

Notebooks’ workflows

DBFS - Theory 60% / Practice 40% - 4h

Databricks and Spark - Theory 60% / Practice 40% - 4h

Data Formats

Transformation

Joins, Aggregation

SQL

Delta Lake - Theory 60% / Practice 40% - 4h

Pitfalls of Data Lakes

Data Lakehouse Architecture

Read & Write to Delta Lake

Updates and Deletes on Delta Lake

Merge/Upsert to Delta Lake

History, Time Travel, Vacuum

Delta Lake Transaction Log

Convert from Parquet to Delta

Data Ingestion

Data Transformation - PySpark and Notebooks

Visualizations in Databricks - Theory 60% / Practice 40% - 2h

Collaboration in Databricks - Theory 60% / Practice 40% - 2h

Deploying Databricks on Azure - Theory 60% / Practice 40% - 2h

Deploying Databricks on the AWS Marketplace - Theory 60% / Practice 40% - 2h

Data Protection Use cases - 4h


    Oleksandr Holota
    • Trainer

    Oleksandr Holota

    Big Data and ML Trainer


    Related courses

    You may also be interested in

    Discover more about professional growth and skills development

    contact us