In this video I cover how to use PySpark with AWS Glue. Using the resources I have uploaded to GitHub we carryout a full tutorial on how to manipulate data and carry out ETL tasks within the AWS Glue Ecosystem. Don't worry if you are new to PySpark, AWS, or Glue I guide you through everything step by step.
LINK TO GITHUB TUTORIAL RESOURCES:
💾 Code Repo: [ Ссылка ]
📈 Slides: [ Ссылка ]
SUPPORT THE CHANNEL:
☕ Buy Me A Coffee: [ Ссылка ]
🖥️ My VPN: [ Ссылка ]
00:00 - Intro
00:46 - Set Up
08:41 - Run Our First PySpark Code - Read Up Data Using A DynamicFrame
10:13 - Spark And PySpark Theory
19:53 - DynamicFrame PrintSchema
22:29 - DynamicFrame Count
23:30 - DynamicFrame Select
27:49 - DynamicFrame Drop Fields
31:02 - DynamicFrame Change Field Name
37:31 - DynamicFrame Filtering
41:39 - DynamicFrame Joining
47:29 - DynamicFrame Write To S3
54:12 - DynamicFrame Write To Glue Data Catalog
58:55 - Spark DataFrame Theory
01:00:25 - Convert To A Spark DataFrame
01:02:49 - Spark DataFrame Select Columns
01:04:31 - Spark DataFrame Add Columns
01:11:06 - Spark DataFrame Drop Columns
01:14:11 - Spark DataFrame Group By And Aggregate
01:15:58 - Spark DataFrame Filter And Where Clause
01:18:58 - Spark DataFrame Joins
01:24:21 - Spark DataFrame Write
01:36:20 - Outro
01:36:32 - Channel Supporters Shout Out
OTHER USEFUL LINKS:
📹 Glue Tutorial: [ Ссылка ]
ℹ️ My Website: [ Ссылка ]
🔗 Linkedin: [ Ссылка ]
😎 About me
I have spent the last decade being immersed in the world of big data working as a consultant for some the globe's biggest companies.My journey into the world of data was not the most conventional. I started my career working as performance analyst in professional sport at the top level's of both rugby and football. I then transitioned into a career in data and computing. This journey culminated in the study of a Masters degree in Software
Enjoy 🤘
Ещё видео!