Databricks fundamentals
This Databricks Fundamentals course will help participants in getting a proper understanding of the internal structure and functioning of Databricks, the most powerful big data processing tool.
Master the essentials of Hadoop with our "Hadoop Fundamentals" course. Learn how to navigate the Hadoop ecosystem, from HDFS to MapReduce, YARN, Hive, and Spark. Gain hands-on experience in managing large-scale data processing and storage, making this course ideal for aspiring data engineers and developers.
To be determined
Hadoop Fundamentals is a comprehensive course designed to introduce you to the core components of the Hadoop ecosystem, providing the foundational knowledge and practical skills necessary to work with big data technologies. Whether you’re a beginner or have some experience, this course will equip you with the expertise needed to effectively manage and process large-scale data using Hadoop.
Course Overview:
By the end of this course, participants will:
This course offers a balanced mix of theory and practice, with 24 hours of content. You’ll engage in hands-on exercises that complement the theoretical knowledge, ensuring you’re ready to apply Hadoop technologies in practical settings.
Upon completion of the "Hadoop Fundamentals" course, trainees will be able to:
Developers, architects, database designers, database administrators
- NoSQL/RDBMS experience
- BigData understanding
1. Basic concepts of modern data architecture (1h theory)
2. HDFS: Hadoop Distributed File System (2h theory, 1h practice)
- Architecture, replication, data in/out, HDFS commands
Practice (shell, Hue): connecting to a cluster, working with the file system
3. The MapReduce paradigm and its implementation in Java and Hadoop Streaming (2h theory, 1h practice)
Practice: Launching applications
4. YARN: Distributed application execution management (theory 1h, practice 1h)
- YARN architecture, application launch in YARN
Practice: launching applications and monitoring the cluster through the UI
5. Introduction to Hive (2h theory, 3h practice)
- Architecture, Table metadata, File formats, HiveQL query language
Practice (Hue, hive, beeline, Tez UI): creating tables, reading & writing CSV, Parquet, ORC, partitioning, SQL queries with aggregation and joins
6. Introduction to Spark (theory 2h, practice 3h)
- DataFrame/SQL, metadata, file formats, data sources, RDD
Practice (Zeppelin, Spark UI): reading & writing from the database (JDBC), CSV, Parquet, partitioning, SQL queries with aggregation and joins, query execution plans, monitoring
7. Introduction to streaming data processing (theory 2h, practice 1h)
- Spark Streaming, Spark Structured Streaming, Flink
Practice: Reading/processing/writing streams between Kafka, relational database and file system
8. Introduction to HBase (1h theory, 1h practice)
- Architecture, query language
Practice (HBase shell): writing and reading data
Total: theory 13h (54%), practice 11h (46%)