In this video, we'll explore how Apache Spark’s architecture enables efficient handling of complex, large datasets and uncover the powerful enhancements Azure Synapse brings to Spark. You'll get hands-on with PySpark in Azure Synapse, walking through exercises for data analysis and engineering. We'll also tackle real-world challenges, demonstrating practical solutions to optimize your Spark workflows.
Whether you’re new to PySpark or looking to refine your skills in Azure Synapse Analytics, this session has you covered!
👉 Don’t forget to Like, Share, and Subscribe for more exciting content!
🌐 User Group Edmonton Microsoft Data Professionals Join the User Group: [ Ссылка ]
👨💻 Speaker Profile: Wilson Mok – Microsoft MVP – Data Platform
Connect with Wilson: [ Ссылка ]
📝 Code Repo [ Ссылка ]
Chapters
0:00:00 Introduction
0:04:01 Recap from last session
0:05:19 Let's dive into Apache Spark architecture
0:09:57 Azure Synapse Spark features
0:16:48 Q&A #1
0:24:00 Shuffling data between workers
0:28:43 Issues with Data Skewing
0:32:00 Demo
1:04:00 Summary + Q&A #2
Ещё видео!