Talk recorded at the Swiss Python Summit on October 18th, 2024.
Licensed as Creative Commons Attribution 4.0 International.
---------
Abstract:
Python developers and data enthusiasts attending the Swiss Python Summit, this talk is for you! Apache Spark is a powerful framework often used alongside Python for big data processing. You've seen its capabilities, but what powers its impressive performance?
Join me, Neil Gibbons, a Backend Engineer with a passion for distributed systems (and a recent MSc in Data Science!). I've also delivered talks at DevFest Mons 2022 and Birkbeck University.
In this session, we'll delve into the internal workings of Spark. We'll explore concepts like Resilient Distributed Datasets (RDDs), which are fundamental to Spark's fault tolerance. We'll see how Spark distributes tasks across a cluster, leveraging Python's strengths in parallel processing. Finally, we'll uncover the secrets of in-memory computations, the key to Spark's blazing speed.
Why attend? Gaining a deeper understanding of Spark's internals, especially within the Python ecosystem, empowers you to:
Optimize your Python big data applications for peak performance.
Troubleshoot issues more efficiently.
Write effective Spark code that unlocks its true potential and complements your Python expertise.
Whether you're a data scientist, developer, or simply curious about big data, this talk will bridge the gap between Python and Spark. Join me as we explore Spark's inner workings!
---------------------
About the speaker(s):
Neil Gibbons is a Backend Engineer with a passion for big data. He's currently pursuing an MSc in Computer Science at St Andrew's University. He's eager to share his experience and delve into Apache Spark's inner workings at the Swiss Python Summit.
Ещё видео!