Community adoption of Kubernetes (instead of YARN) as a scheduler for Apache Spark has been accelerating since the major improvements from Spark 3.0 release. Companies choose to run Spark on Kubernetes to use a single cloud-agnostic technology across their entire stack, and to benefit from improved isolation and resource sharing for concurrent workloads. In this talk, the founders of Data Mechanics, a serverless Spark platform powered by Kubernetes, will show how to easily get started with Spark on Kubernetes.
We will go through an end-to-end example of building, deploying and maintaining an end-to-end data pipeline. This will be a code-heavy session with many tips to help beginners and intermediate Spark developers be successful with Spark on Kubernetes, and live demos running on the Data Mechanics platform.
Included topics:
– Setting up your environment (data access, node pools)
– Sizing your applications (pod sizes, dynamic allocation)
– Boosting your performance through critical disk and I/O optimizations
– Monitoring your application logs and metrics for debugging and reporting
About:
Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Read more here: [ Ссылка ]
See all the previous Summit sessions: [ Ссылка ]
Connect with us:
Website: [ Ссылка ]
Facebook: [ Ссылка ]
Twitter: [ Ссылка ]
LinkedIn: [ Ссылка ]
Instagram: [ Ссылка ] Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. [ Ссылка ]
Ещё видео!