In this video I have explained how you can use AWS Glue with Apache Hudi format to build data lake. This is a beginner video and intentionally I have kept it simple for understanding purpose.
In this video I have given overview of AWS Glue , Apache Hudi and a demo to build a SCD-1 type dimension table. How you can run Update & Insert on top of Hudi table.
Video Timeline:
00:00 Introduction
00:35 What is AWS Glue
00:50 Glue Data Catalog
01:25 Glue Crawlers
02:25 Glue Studio visual builder
02:45 Apache Hudi
04:12 My intention of making this video
05:05 AWS Glue - serverless service
05:51 AWS Glue - Spark ETL
05:57 AWS Glue - Python Jobs
06:10 AWS Glue vs AWS Lambda for python jobs
06:46 AWS Glue Data Catalog
07:01 What is Metadata
07:20 Metadata - business, technical, operational
08:10 Glue Data Catalog - why is it so powerful
08:40 AWS Glue crawlers for automated data discovery
09:50 do I use glue crawler a lot ?
10:20 Glue Studio visual builder
10:42 do I use visual builder a lot ?
12:14 Apache Hudi - open source data lake format
12:58 Datalake vs Data warehouse
14:22 Hudi ACID
14:44 Hudi versioning
14:52 Hudi integration with Glue, Athena, Redshift
15:37 Revisit the concepts before Demo
15:50 Demo (2 input files)
16:25 Demo - create Glue job
20:04 Demo - save glue job and run it
20:18 Demo - Glue job input arguments
21:02 Demo - Glue job script walkthrough
22:42 Job complete , check table data
23:41 Demo - run second file for UPSERT (SCD-1)
25:38 Demo - Glue continuous driver logs
26:24 Demo - Hive style partitioning in Hudi
27:01 Demo -2nd run complete, check data
27:18 Demo - end of demo
Will you be interested in AWS data engineering session with me ?
Download the resources used in this video :
[ Ссылка ]
AWS cost for the demo : $2
Ещё видео!