Kafka Fundamentals

Master the essentials of Kafka with our "Kafka Fundamentals" course. Learn about Kafka architecture, topics, and APIs, and gain hands-on experience with AVRO, Schema Registry, SpringBoot, and streaming pipelines. Perfect for developers and data engineers looking to build and optimize real-time data processing applications.

24 hours
English
Online

Description

Kafka Fundamentals is a comprehensive course designed to provide you with a deep understanding of Apache Kafka, one of the most popular platforms for building real-time data pipelines and streaming applications. This course is ideal for developers, data engineers, and system architects who want to learn how to design, build, and manage Kafka-based solutions effectively.

The course begins with an exploration of Kafka’s architecture, where you’ll learn how to plan and design your own distributed queue. You’ll address key questions related to message format, consumption patterns, data persistence, and retention, and understand how to support multiple producers and consumers efficiently.

Next, you’ll dive into Kafka topics, console producers, and console consumers, learning how to create topics with multiple partitions, ensure data replication, and manage message order and data skew. You’ll also gain practical experience in optimizing message writing and reading for different use cases, including low latency and maximum compression scenarios.

The course then covers working with Kafka using various programming languages, including Java, Scala, and Python. You’ll build simple consumers and producers, manage consumer groups, and handle transactions. This module also explores integration with Web UI and REST APIs.

A dedicated module on AVRO and Schema Registry follows, where you’ll learn how to add and manage AVRO schemas, build AVRO consumers and producers, and use Schema Registry to ensure data consistency. You’ll also learn how to handle errors using specific error topics.

In the SpringBoot and SpringCloud module, you’ll learn how to integrate Kafka with Spring applications. You’ll write templates for Spring Apps, add Kafka Templates for producers and consumers, and modify Spring Boot to work in asynchronous mode. The course also covers streaming pipelines, where you’ll compare Kafka Streams, KSQL, Kafka Connect, Akka Streams, Spark Streaming, and Flink. You’ll learn how to build robust streaming pipelines and manage checkpoints, backpressure, and executor management.

Finally, the course concludes with Kafka monitoring, where you’ll learn how to build and manage Kafka metrics using tools like Grafana, ensuring your Kafka deployments are optimized and well-monitored.

Learning Outcomes: By the end of this course, participants will:

Understand Kafka’s architecture and design distributed queues with optimal performance.
Efficiently manage Kafka topics, producers, and consumers, ensuring data consistency and performance.
Integrate Kafka with Java, Scala, Python, and other languages via REST, and handle transactions effectively.
Implement AVRO schemas and use Schema Registry to manage data serialization and deserialization.
Build and optimize streaming pipelines using Kafka Streams, KSQL, and other streaming frameworks.
Monitor Kafka clusters effectively using Grafana and other monitoring tools.

Objectives

Upon completion of the "Kafka Fundamentals" course, trainees will be able to:

Design and implement distributed queues using Kafka, with a focus on message format, order, and persistence.
Create and manage Kafka topics, partitions, and replicas, ensuring optimal performance and reliability.
Develop and integrate Kafka consumers and producers in Java, Scala, Python, and through REST APIs.
Use AVRO and Schema Registry to manage data serialization and ensure compatibility across services.
Build and manage robust streaming pipelines with Kafka Streams, KSQL, and other streaming frameworks.
Monitor Kafka clusters, set up metrics, and optimize Kafka performance using tools like Grafana.

Roadmap

1. Module 1: Kafka Architecture: theory 2h / practice 1.5h

Planning your own distributed queue in pairs: write, read, keep data in parallel mode.
What's the format and average size of messages?
Can messages be repeatedly consumed?
Are messages consumed in the same order they were produced?
Does data need to be persisted?
What is data retention?
How many producers and consumers are we going to support?

2. Module 2: Kafka-topics, console-consumer, console-producer: theory 2h / practice 1.5h

Using internal Kafka-topics, console-consumer, console-producer
Create topic with 3 partitions & RF = 2
Send message, check the ISR
Organize message writing/reading with order message keeping
Organize message writing/reading without order message keeping and hash partitioning
Organize message writing/reading without skew data
Read messages from the start, end and offset
Read topic with 2 partitions / 2 consumers in one consumer group (and different consumer group)
Choose optimal number of consumers for reading topic with 4 partitions
Write messages with min latency
Write messages with max compression

3. Module 3: Web UI + Java, Scala, Python API + other languages (via Rest): theory 2h / practice 1.5h

build simple consumer and producer
add one more consumer to consumer group
write consumer which reads 3 records from 1st partition
add writing to another topic
add transaction

4. Module 4: AVRO + Schema Registry: theory 2h / practice 1.5h

Add avro schema
compile java class
build avro consumer and producer with a specific record
add schema registry
add error topic with error topic and schema registry
build avro consumer and producer with a generic record

5. Module 5: SpringBoot + SpringCloud: theory 2h / practice 1.5h

Homework:

Write template for Spring App
Add Kafka Template with producer
Add Kafka Template with consumer
Add rest controller
Modify spring boot to work in async (parallel) mode

6. Module 6: Streaming Pipelines (Kafka Streams + KSQL + Kafka Connect vs Akka Streams vs Spark Streaming vs Flink), theory 2h / practice 1.5h

Homework:

Choose the way to read data from a Kafka topic with 50 partitions
Try to use the checkpoint mechanism
Start the five executors and kill some of them
Check the backpressure

7. Module 7: Kafka Monitoring, theory 2h / practice 1.5h

Homework:

Build several metrics in Grafana

Total: theory 14h (58%) / practice 10h (42%)

Related courses

Unlock the power of big data analytics with "BigData SQL Hive." This course dives deep into Apache Hive, covering everything from architecture and data types to complex queries, transactions, and performance tuning. Perfect for data professionals looking to enhance their SQL skills in a big data environment.

Master the fundamentals of data warehousing with our "Data Warehouse Fundamentals" course. Explore key concepts, architectures, and methodologies from Inmon, Kimball, and DataVault. Understand how data governance and design methods shape modern data warehouses. Ideal for those looking to build robust, scalable data systems.

Gain a solid foundation in machine learning with our "Machine Learning Fundamentals" course. Explore essential ML concepts, algorithms, and practical tools like Spark MLLib and Spark ML. Perfect for aspiring data scientists and engineers looking to apply ML techniques in real-world scenarios.

Kafka Fundamentals

Description

Objectives

Target Audience

Prerequisites

Roadmap

Related courses

BigData SQL Hive

Data Warehouse Fundamentals

Machine learning fundamentals

You may also be interested in

Discover more about professional growth and skills development