Introduction to Data Engineering
Identify key upstream and downstream collaborators and stakeholders for data engineers. Identify the stages of the data engineering lifecycle and key undercurrents. Articulate a mental framework for building data engineering solutions. Identify necessary considerations for requirements gathering.
The Data Engineering Lifecycle & Undercurrents
Articulate the structure of the data engineering lifecycle and its undercurrents. Identify key technologies in the AWS data engineering stack for different stages of the lifecycle.
Data Architecture
Think critically about components of an end-to-end data architecture. Evaluate technologies and tools against requirements and good data architecture.
Translating Requirements to Architecture
Design a data architecture on AWS based on stakeholder requirements. Implement a batch and streaming pipeline on AWS.
Working with Source Systems
Identify different data formats and determine appropriate source systems. Explain relational and NoSQL databases, ACID compliance, and CRUD operations. Interact with object storage and explain cloud networking.
Data Ingestion
Explain batch and streaming ingestions, their use cases, and ingestion patterns. Interact with a REST API and create a script to ingest data. Describe components of an event-streaming platform.
DataOps
Explain DevOps automation concepts in DataOps. Use Terraform to provision AWS resources. Apply data quality tests using Great Expectations.
Orchestration, Monitoring, and Automating Your Data Pipelines
Explain orchestration in data pipelines. Build data pipelines with DAGs in Airflow. Integrate data quality testing in an orchestrated pipeline.
Storage Ingredients and Storage Systems
Explain how data is stored on disk and in memory. Compare storage systems and explain row-oriented vs column-oriented databases.
Storage Abstractions
Explain architectural features of data warehouses and data lakes. Implement a data lakehouse with a medallion-like architecture.
Queries
Explain the life of a query. Implement advanced SQL queries and discuss query performance strategies.
Data Modeling & Simple Transformations for Analytics
Define data modeling and apply normalization stages. Transform data in third normal form to a star schema.
Data Modeling & Simple Transformations for Machine Learning
Differentiate between learning types and apply preprocessing steps for machine learning. List techniques for vectorizing textual data.
Transformations
Identify batch transformation use cases. Compare processing frameworks like Spark and Hadoop.
Serving Data
Identify ways of serving data for analytics and machine learning. Build an end-to-end data pipeline to serve data that provides business value.