Udemy – Data Engineering using Databricks on AWS and Azure

Udemy – Data Engineering using Databricks on AWS and Azure – Durga Viswanatha Raju Gadiraju
English | Tutorial | Size: 13.83 GB


Build Data Engineering Pipelines using Databricks core features such as Spark, Delta Lake, cloudFiles, etc.

As part of this course, you will learn all the Data Engineering using cloud platform-agnostic technology called Databricks.

About Data Engineering

Data Engineering is nothing but processing the data depending on our downstream needs. We need to build different pipelines such as Batch Pipelines, Streaming Pipelines, etc as part of Data Engineering. All roles related to Data Processing are consolidated under Data Engineering. Conventionally, they are known as ETL Development, Data Warehouse Development, etc.

About Databricks

Databricks is the most popular cloud platform-agnostic data engineering tech stack. They are the committers of the Apache Spark project. Databricks run time provide Spark leveraging the elasticity of the cloud. With Databricks, you pay for what you use. Over a period of time, they came up with the idea of Lakehouse by providing all the features that are required for traditional BI as well as AI & ML. Here are some of the core features of Databricks.

Spark – Distributed Computing

Delta Lake – Perform CRUD Operations. It is primarily used to build capabilities such as inserting, updating, and deleting the data from files in Data Lake.

cloudFiles – Get the files in an incremental fashion in the most efficient way leveraging cloud features.

Databricks SQL – A Photon-based interface that is fine-tuned for running queries submitted for reporting and visualization by reporting tools. It is also used for Ad-hoc Analysis.

Course Details

As part of this course, you will be learning Data Engineering using Databricks.

Getting Started with Databricks

Setup Local Development Environment to develop Data Engineering Applications using Databricks

Using Databricks CLI to manage files, jobs, clusters, etc related to Data Engineering Applications

Spark Application Development Cycle to build Data Engineering Applications

Databricks Jobs and Clusters

Deploy and Run Data Engineering Jobs on Databricks Job Clusters as Python Application

Deploy and Run Data Engineering Jobs on Databricks Job Clusters using Notebooks

Deep Dive into Delta Lake using Dataframes on Databricks Platform

Deep Dive into Delta Lake using Spark SQL on Databricks Platform

Building Data Engineering Pipelines using Spark Structured Streaming on Databricks Clusters

Incremental File Processing using Spark Structured Streaming leveraging Databricks Auto Loader cloudFiles

Overview of AutoLoader cloudFiles File Discovery Modes – Directory Listing and File Notifications

Differences between Auto Loader cloudFiles File Discovery Modes – Directory Listing and File Notifications

Differences between traditional Spark Structured Streaming and leveraging Databricks Auto Loader cloudFiles for incremental file processing.

Overview of Databricks SQL for Data Analysis and reporting.

We will be adding a few more modules related to Pyspark, Spark with Scala, Spark SQL, and Streaming Pipelines in the coming weeks.

Desired Audience

Here is the desired audience for this advanced course.

Experienced application developers to gain expertise related to Data Engineering with prior knowledge and experience of Spark.

Experienced Data Engineers to gain enough skills to add Databricks to their profile.

Testers to improve their testing capabilities related to Data Engineering applications using Databricks.

Prerequisites

Logistics

Computer with decent configuration (At least 4 GB RAM, however 8 GB is highly desired)

Dual Core is required and Quad-Core is highly desired

Chrome Browser

High-Speed Internet

Valid AWS Account

Valid Databricks Account (free Databricks Account is not sufficient)

Experience as Data Engineer especially using Apache Spark

Knowledge about some of the cloud concepts such as storage, users, roles, etc.

Associated Costs

As part of the training, you will only get the material. You need to practice on your own or corporate cloud account and Databricks Account.

You need to take care of the associated AWS or Azure costs.

You need to take care of the associated Databricks costs.

Training Approach

Here are the details related to the training approach.

It is self-paced with reference material, code snippets, and videos provided as part of Udemy.

One needs to sign up for their own Databricks environment to practice all the core features of Databricks.

We would recommend completing 2 modules every week by spending 4 to 5 hours per week.

It is highly recommended to take care of all the tasks so that one can get real experience of Databricks.

Support will be provided through Udemy Q&A.

Here is the detailed course outline.

Getting Started with Databricks on Azure

As part of this section, we will go through the details about signing up to Azure and setup the Databricks cluster on Azure.

Getting Started with Databricks on Azure

Signup for the Azure Account

Login and Increase Quotas for regional vCPUs in Azure

Create Azure Databricks Workspace

Launching Azure Databricks Workspace or Cluster

Quick Walkthrough of Azure Databricks UI

Create Azure Databricks Single Node Cluster

Upload Data using Azure Databricks UI

Overview of Creating Notebook and Validating Files using Azure Databricks

Develop Spark Application using Azure Databricks Notebook

Validate Spark Jobs using Azure Databricks Notebook

Export and Import of Azure Databricks Notebooks

Terminating Azure Databricks Cluster and Deleting Configuration

Delete Azure Databricks Workspace by deleting Resource Group

Azure Essentials for Databricks – Azure CLI

As part of this section, we will go through the details about setting up Azure CLI to manage Azure resources using relevant commands.

Azure Essentials for Databricks – Azure CLI

Azure CLI using Azure Portal Cloud Shell

Getting Started with Azure CLI on Mac

Getting Started with Azure CLI on Windows

Warming up with Azure CLI – Overview

Create Resource Group using Azure CLI

Create ADLS Storage Account with in Resource Group

Add Container as part of Storage Account

Overview of Uploading the data into ADLS File System or Container

Setup Data Set locally to upload into ADLS File System or Container

Upload local directory into Azure ADLS File System or Container

Delete Azure ADLS Storage Account using Azure CLI

Delete Azure Resource Group using Azure CLI

Mount ADLS on to Azure Databricks to access files from Azure Blob Storage

As part of this section, we will go through the details related to mounting Azure Data Lake Storage (ADLS) on to Azure Databricks Clusters.

Mount ADLS on to Azure Databricks – Introduction

Ensure Azure Databricks Workspace

Setup Databricks CLI on Mac or Windows using Python Virtual Environment

Configure Databricks CLI for new Azure Databricks Workspace

Register an Azure Active Directory Application

Create Databricks Secret for AD Application Client Secret

Create ADLS Storage Account

Assign IAM Role on Storage Account to Azure AD Application

Setup Retail DB Dataset

Create ADLS Container or File System and Upload Data

Start Databricks Cluster to mount ADLS

Mount ADLS Storage Account on to Azure Databricks

Validate ADLS Mount Point on Azure Databricks Clusters

Unmount the mount point from Databricks

Delete Azure Resource Group used for Mounting ADLS on to Azure Databricks

Setup Local Development Environment for Databricks

As part of this section, we will go through the details related to setting up of local development environment for Databricks using tools such as Pycharm, Databricks dbconnect, Databricks dbutils, etc.

Setup Single Node Databricks Cluster

Install Databricks Connect

Configure Databricks Connect

Integrating Pycharm with Databricks Connect

Integrate Databricks Cluster with Glue Catalog

Setup AWS s3 Bucket and Grant Permissions

Mounting s3 Buckets into Databricks Clusters

Using Databricks dbutils from IDEs such as Pycharm

Using Databricks CLI

As part of this section, we will get an overview of Databricks CLI to interact with Databricks File System or DBFS.

Introduction to Databricks CLI

Install and Configure Databricks CLI

Interacting with Databricks File System using Databricks CLI

Getting Databricks Cluster Details using Databricks CLI

Databricks Jobs and Clusters

As part of this section, we will go through the details related to Databricks Jobs and Clusters.

Introduction to Databricks Jobs and Clusters

Creating Pools in Databricks Platform

Create Cluster on Azure Databricks

Request to Increase CPU Quota on Azure

Creating Job on Databricks

Submitting Jobs using Databricks Job Cluster

Create Pool in Databricks

Running Job using Interactive Databricks Cluster Attached to Pool

Running Job Using Databricks Job Cluster Attached to Pool

Exercise – Submit the application as a job using Databricks interactive cluster

Deploy and Run Spark Applications on Databricks

As part of this section, we will go through the details related to deploying Spark Applications on Databricks Clusters and also running those applications.

Prepare PyCharm for Databricks

Prepare Data Sets

Move files to ghactivity

Refactor Code for Databricks

Validating Data using Databricks

Setup Data Set for Production Deployment

Access File Metadata using Databricks dbutils

Build Deployable bundle for Databricks

Running Jobs using Databricks Web UI

Get Job and Run Details using Databricks CLI

Submitting Databricks Jobs using CLI

Setup and Validate Databricks Client Library

Resetting the Job using Databricks Jobs API

Run Databricks Job programmatically using Python

Detailed Validation of Data using Databricks Notebooks

Deploy and Run Spark Jobs using Notebooks

As part of this section, we will go through the details related to deploying Spark Applications on Databricks Clusters and also running those applications using Databricks Notebooks.

Modularizing Databricks Notebooks

Running Job using Databricks Notebook

Refactor application as Databricks Notebooks

Run Notebook using Databricks Development Cluster

Deep Dive into Delta Lake using Spark Data Frames on Databricks

As part of this section, we will go through all the important details related to Databricks Delta Lake using Spark Data Frames.

Introduction to Delta Lake using Spark Data Frames on Databricks

Creating Spark Data Frames for Delta Lake on Databricks

Writing Spark Data Frame using Delta Format on Databricks

Updating Existing Data using Delta Format on Databricks

Delete Existing Data using Delta Format on Databricks

Merge or Upsert Data using Delta Format on Databricks

Deleting using Merge in Delta Lake on Databricks

Point in Snapshot Recovery using Delta Logs on Databricks

Deleting unnecessary Delta Files using Vacuum on Databricks

Compaction of Delta Lake Files on Databricks

Deep Dive into Delta Lake using Spark SQL on Databricks

As part of this section, we will go through all the important details related to Databricks Delta Lake using Spark SQL.

Introduction to Delta Lake using Spark SQL on Databricks

Create Delta Lake Table using Spark SQL on Databricks

Insert Data to Delta Lake Table using Spark SQL on Databricks

Update Data in Delta Lake Table using Spark SQL on Databricks

Delete Data from Delta Lake Table using Spark SQL on Databricks

Merge or Upsert Data into Delta Lake Table using Spark SQL on Databricks

Using Merge Function over Delta Lake Table using Spark SQL on Databricks

Point in Snapshot Recovery using Delta Lake Table using Spark SQL on Databricks

Vacuuming Delta Lake Tables using Spark SQL on Databricks

Compaction of Delta Lake Tables using Spark SQL on Databricks

Accessing Databricks Cluster Terminal via Web as well as SSH

As part of this section, we will see how to access terminal related to Databricks Cluster via Web as well as SSH.

Enable Web Terminal in Databricks Admin Console

Launch Web Terminal for Databricks Cluster

Setup SSH for the Databricks Cluster Driver Node

Validate SSH Connectivity to the Databricks Driver Node on AWS

Limitations of SSH and comparison with Web Terminal related to Databricks Clusters

Installing Softwares on Databricks Clusters using init scripts

As part of this section, we will see how to bootstrap Databricks clusters by installing relevant 3rd party libraries for our applications.

Setup gen_logs on Databricks Cluster

Overview of Init Scripts for Databricks Clusters

Create Script to install software from git on Databricks Cluster

Copy init script to dbfs location

Create Databricks Standalone Cluster with init script

Quick Recap of Spark Structured Streaming

As part of this section, we will get a quick recap of Spark Structured streaming.

Validate Netcat on Databricks Driver Node

Push log messages to Netcat Webserver on Databricks Driver Node

Reading Web Server logs using Spark Structured Streaming

Writing Streaming Data to Files

Incremental Loads using Spark Structured Streaming on Databricks

As part of this section, we will understand how to perform incremental loads using Spark Structured Streaming on Databricks.

Overview of Spark Structured Streaming

Steps for Incremental Data Processing on Databricks

Configure Databricks Cluster with Instance Profile

Upload GHArchive Files to AWS s3 using Databricks Notebooks

Read JSON Data using Spark Structured Streaming on Databricks

Write using Delta file format using Trigger Once on Databricks

Analyze GHArchive Data in Delta files using Spark on Databricks

Add New GHActivity JSON files on Databricks

Load Data Incrementally to Target Table on Databricks

Validate Incremental Load on Databricks

Internals of Spark Structured Streaming File Processing on Databricks

Incremental Loads using autoLoader Cloud Files on Databricks

As part of this section we will see how to perform incremental loads using autoLoader cloudFiles on Databricks Clusters.

Overview of AutoLoader cloudFiles on Databricks

Upload GHArchive Files to s3 on Databricks

Write Data using AutoLoader cloudFiles on Databricks

Add New GHActivity JSON files on Databricks

Load Data Incrementally to Target Table on Databricks

Add New GHActivity JSON files on Databricks

Overview of Handling S3 Events using AWS Services on Databricks

Configure IAM Role for cloudFiles file notifications on Databricks

Incremental Load using cloudFiles File Notifications on Databricks

Review AWS Services for cloudFiles Event Notifications on Databricks

Review Metadata Generated for cloudFiles Checkpointing on Databricks

Overview of Databricks SQL Clusters

As part of this section, we will get an overview of Databricks SQL Clusters.

Overview of Databricks SQL Platform – Introduction

Run First Query using SQL Editor of Databricks SQL

Overview of Dashboards using Databricks SQL

Overview of Databricks SQL Data Explorer to review Metastore Databases and Tables

Use Databricks SQL Editor to develop scripts or queries

Review Metadata of Tables using Databricks SQL Platform

Overview of loading data into retail_db tables

Configure Databricks CLI to push data into the Databricks Platform

Copy JSON Data into DBFS using Databricks CLI

Analyze JSON Data using Spark APIs

Analyze Delta Table Schemas using Spark APIs

Load Data from Spark Data Frames into Delta Tables

Run Adhoc Queries using Databricks SQL Editor to validate data

Overview of External Tables using Databricks SQL

Using COPY Command to Copy Data into Delta Tables

Manage Databricks SQL Endpoints

Buy Long-term Premium Accounts To Support Me & Max Speed


RAPIDGATOR:
rapidgator.net/file/62316f7afe916af505a064b989225647/Udemy_-_Data_Engineering_using_Databricks_on_AWS_and_Azure_-_Durga_Viswanatha_Raju_Gadiraju.part01.rar.html
rapidgator.net/file/c1a047fcb64735274c75b901c3bc906c/Udemy_-_Data_Engineering_using_Databricks_on_AWS_and_Azure_-_Durga_Viswanatha_Raju_Gadiraju.part02.rar.html
rapidgator.net/file/3d72dc52850f5291a577b2cfade67584/Udemy_-_Data_Engineering_using_Databricks_on_AWS_and_Azure_-_Durga_Viswanatha_Raju_Gadiraju.part03.rar.html
rapidgator.net/file/e3c88c41de0896131668a4f5ad6251dd/Udemy_-_Data_Engineering_using_Databricks_on_AWS_and_Azure_-_Durga_Viswanatha_Raju_Gadiraju.part04.rar.html
rapidgator.net/file/4c6d5462ce6ddafa9cb841cdbebe05af/Udemy_-_Data_Engineering_using_Databricks_on_AWS_and_Azure_-_Durga_Viswanatha_Raju_Gadiraju.part05.rar.html
rapidgator.net/file/e6b10c6b364bcddf17ae95416b237727/Udemy_-_Data_Engineering_using_Databricks_on_AWS_and_Azure_-_Durga_Viswanatha_Raju_Gadiraju.part06.rar.html
rapidgator.net/file/c5c8bfa9e210b38b21414f1d4cb51f91/Udemy_-_Data_Engineering_using_Databricks_on_AWS_and_Azure_-_Durga_Viswanatha_Raju_Gadiraju.part07.rar.html
rapidgator.net/file/3d5ec253b6cbda6ea5962aa717b8ff30/Udemy_-_Data_Engineering_using_Databricks_on_AWS_and_Azure_-_Durga_Viswanatha_Raju_Gadiraju.part08.rar.html
rapidgator.net/file/da356853a421aaf8a8a86cf278d9eeea/Udemy_-_Data_Engineering_using_Databricks_on_AWS_and_Azure_-_Durga_Viswanatha_Raju_Gadiraju.part09.rar.html
rapidgator.net/file/51c3aff594e5fa090dc3bdb1c21204bb/Udemy_-_Data_Engineering_using_Databricks_on_AWS_and_Azure_-_Durga_Viswanatha_Raju_Gadiraju.part10.rar.html
rapidgator.net/file/4725976a85bc71e2eed93fc1e44f75f9/Udemy_-_Data_Engineering_using_Databricks_on_AWS_and_Azure_-_Durga_Viswanatha_Raju_Gadiraju.part11.rar.html
rapidgator.net/file/745dbda86c9b6459ca34602070d719f9/Udemy_-_Data_Engineering_using_Databricks_on_AWS_and_Azure_-_Durga_Viswanatha_Raju_Gadiraju.part12.rar.html

TURBOBIT:
tbit.to/a3n23y6dvwck/Udemy%20-%20Data%20Engineering%20using%20Databricks%20on%20AWS%20and%20Azure%20-%20Durga%20Viswanatha%20Raju%20Gadiraju.part01.rar.html
tbit.to/istsrak7d843/Udemy%20-%20Data%20Engineering%20using%20Databricks%20on%20AWS%20and%20Azure%20-%20Durga%20Viswanatha%20Raju%20Gadiraju.part02.rar.html
tbit.to/7k15bldhkri9/Udemy%20-%20Data%20Engineering%20using%20Databricks%20on%20AWS%20and%20Azure%20-%20Durga%20Viswanatha%20Raju%20Gadiraju.part03.rar.html
tbit.to/b93db3abuea5/Udemy%20-%20Data%20Engineering%20using%20Databricks%20on%20AWS%20and%20Azure%20-%20Durga%20Viswanatha%20Raju%20Gadiraju.part04.rar.html
tbit.to/7ygcvq3vxxuh/Udemy%20-%20Data%20Engineering%20using%20Databricks%20on%20AWS%20and%20Azure%20-%20Durga%20Viswanatha%20Raju%20Gadiraju.part05.rar.html
tbit.to/ejgepdggdtkj/Udemy%20-%20Data%20Engineering%20using%20Databricks%20on%20AWS%20and%20Azure%20-%20Durga%20Viswanatha%20Raju%20Gadiraju.part06.rar.html
tbit.to/x4rsarojzypi/Udemy%20-%20Data%20Engineering%20using%20Databricks%20on%20AWS%20and%20Azure%20-%20Durga%20Viswanatha%20Raju%20Gadiraju.part07.rar.html
tbit.to/t1whkybfmn8n/Udemy%20-%20Data%20Engineering%20using%20Databricks%20on%20AWS%20and%20Azure%20-%20Durga%20Viswanatha%20Raju%20Gadiraju.part08.rar.html
tbit.to/9vp6m6fafoms/Udemy%20-%20Data%20Engineering%20using%20Databricks%20on%20AWS%20and%20Azure%20-%20Durga%20Viswanatha%20Raju%20Gadiraju.part09.rar.html
tbit.to/ou8vx7nyjrxw/Udemy%20-%20Data%20Engineering%20using%20Databricks%20on%20AWS%20and%20Azure%20-%20Durga%20Viswanatha%20Raju%20Gadiraju.part10.rar.html
tbit.to/a8sxf6cfi3se/Udemy%20-%20Data%20Engineering%20using%20Databricks%20on%20AWS%20and%20Azure%20-%20Durga%20Viswanatha%20Raju%20Gadiraju.part11.rar.html
tbit.to/tcy5tzrukuk4/Udemy%20-%20Data%20Engineering%20using%20Databricks%20on%20AWS%20and%20Azure%20-%20Durga%20Viswanatha%20Raju%20Gadiraju.part12.rar.html

Leave a Comment