Modern Data Management Approaches in Real World Cases
Description
In application design, one of the most important decisions is to build scalable architecture and select the method for data storage. For decades, relational databases remained the first and only option, thus projects differed only in their degree of normalization, location of business logic, etc. In the last 10-15 years, a lot of alternative systems have appeared – from object-oriented and document-oriented DBMS to distributed file / data flow processing systems.
This course reviews a range of modern solutions which work together in an issue of collecting statistics from the gaming card. We will learn Read / Write paths, Physical Stores, Data formats, Amount of Data, Pros & Cons of such storages like Relation Model, Document Oriented, Message Queue, Key Value, MPP, In Memory, etc.
Detailed
architecture review of Kafka, MongoDB, Cassandra in modern architectures. Also
comparation of their usage in comparation with RDBMS approaches.This course provides an overview of modern data
architecture. We will learn real world high load architecture of the Nvidia
company with such storages like relational data base, message queues, data
storage, key-value stores and mpp distributed data storage. Also using of
Kafka, Cassandra, MongoDB in modern solutions.
is issued on the Luxoft Training form
Objectives
Upon completion of the course, students will be able to:
-
Build a real-world architecture with regard to the issue of collecting statistics of more than 20M gaming cards;
-
Understand Read / Write paths, Physical Stores, Data Formats, Amount of Data, Pros & Cons of such storages like Relation Model, Document Oriented, Message Queue, Key Value, MPP, In Memory, etc.;
-
Understand what data and request characteristics have to be considered at the stage of requirements analysis and selection of data management systems;
-
Know the possibilities and limitations of modern relational and non-relational data management systems;
-
Analyze requirements while selecting database management systems.
Target Audience
Architects, application developers, analysts, and database administrators.
Roadmap
-
Real-world architecture with regard to the issue of collecting statistics of more than 24M gaming cards. Estimates. [theory: 1 hour Practice 1 Hour]
-
The evolution of approaches to data storage: databases, data storages, database machines, mass-parallel architectures, hyperconvergence [theory: 0.5 hour]
-
Relational model: which problems can be solved at the expense of what replication, sharding, distributed transactions [theory: : 0.5 hour]
-
Document-oriented model. [MongoDB] [theory: 2.5 hour; practice: 1.5 hour]
-
Message queues and streaming platforms. Data stream processing. [Spark Streaming] [theory: 2 hours practice: 2 hours]
-
“Key-value” minimal model: key structure options, value structure options, program interfaces. Efficiency of non-relational databases: necessary and sufficient conditions [Cassandra, HBase] [theory: 2 hours practice: 2 hours]
-
Distributed file systems: cluster architecture [HDFS]. SQL over distributed file systems: possible architectures, limitations, transactions. [Hive, Spark, Spark SQL, Parquet, ORC] [theory: 2 hours practice: 2 hours]
-
Distributed in-memory data storage systems. [Hazelcast, Ignite, Tarantool] [theory: 0.5 hour]
-
Distributed OLAP systems. [Druid] [theory: 0.5 hour]
16 hours + (1 hour bonus). Theory - 8,5h (55%), practice 7,5h (45%)