02-6766 494 dukeuae@eim.ae
01: Introduction to Big Data Concepts
  • Big Data introduction
  • OLTP vs OLAP
  • SQL vs NoSQL
  • Data Warehouses vs Data Lakes
  • Batch vs Streaming processing
02: Apache Spark Programming Essentials
  • Spark architecture (Driver, Executors, Cluster Manager)
  • RDD vs DataFrame vs Dataset
  • Transformations vs Actions
  • Lazy evaluation & Catalyst optimizer
  • Writing Spark code in PySpark
03: Spark SQL & DataFrame Analytics
  • DataFrame operations (select, where, groupBy)
  • Joins, aggregations, window functions
  • UDFs & performance considerations
  • Temporary & global views
  • Data exploration & profiling with Spark SQL
04: Azure Data Lake & Cloud Storage Foundations
  • Azure Data Lake Storage Gen1 vs Gen2
  • Hierarchical namespace, ACLs & RBAC
  • Storage account configuration (Blob, ADLS)
  • Data organization (Bronze/Silver/Gold)
  • Accessing ADLS using Databricks & ADF
05: Data Movement & Transformation in Azure
  • ADF overview (Linked Services, Datasets, Pipelines)
  • Integration Runtime & ADF architecture
  • ETL vs ELT in ADF
  • Copy Activity, Lookup, ForEach, Conditional logic
  • Mapping Data Flows (joins, transformations, data quality)
  • Incremental loads & Change Data Capture (CDC)
  • Orchestrating data pipelines end-to-end
  • Data migration scenarios
06: Introduction to Azure Databricks & Lakehouse
  • Databricks workspace & components
  • Lakehouse architecture
  • Clusters, Jobs, Notebooks, SQL Warehouses
  • Databricks Repos & version control
  • When to use Databricks vs ADF vs Synapse
07: Databricks Workspace, Clusters & Notebooks
  • Workspace UI deep dive
  • Cluster types & autoscaling
  • Notebook management
  • REST API & Databricks CLI setup
  • Using Git (Repos) in Databricks
08: Data Ingestion Techniques for the Lakehouse
  • Ingesting CSV, JSON, XML, Parquet
  • Auto Loader (cloudFiles)
  • Mounting ADLS/Blob
  • Streaming ingestion basics
  • Optimizing ingestion for high-volume data
09: Data Management, Governance & Unity Catalog
  • DBFS vs external tables
  • Metastore & data governance
  • Unity Catalog: catalogs, schemas, tables, permissions
  • Lineage & auditing
  • Securing data lake access
10: Databricks Utilities, Widgets & Automation
  • dbutils for file, secret, job management
  • Widgets for parameterized notebooks
  • CI/CD with Databricks Repos
  • Automating workflows with Jobs & Pipelines
  • Operational best practices
11: Delta Lake Architecture & Operations
  • ACID transactions
  • Delta logs & versioning
  • Schema enforcement & evolution
  • Time travel & auditing
  • Table optimization (OPTIMIZE, ZORDER)
12: LakeFlow & Modern Data Orchestration
  • LakeFlow overview
  • Orchestrating ingestion & transformation
  • Integrating DLT, Auto Loader, and pipelines
  • Monitoring, lineage, and governance
  • Event-driven pipeline architectures
13: Power BI Integration
  • Connecting Power BI to Databricks SQL Warehouse
  • Import vs DirectQuery
  • Optimizing queries for BI
  • Using Delta tables for analytics
  • Publishing & refreshing dashboards

    Quick Enquiry

    If you have any general course enquiries, please fill the form and get in touch today.

    Other Courses

    WhatsApp Support
    Our support team is here to answer your questions. Tell us how we can Help
    👋 Hi, how can I help?