ETL pipelines are ubiquitous in data-warehousing and lakehouse ecosystem to make business decisions by building raw and derived tables from a plethora of fragmented data. As companies are investing in building data applications for richer user experiences, ETL pipelines need to solve some of the modern-day challenges of helping deliver low-latency analytics.
In this lightning talk, we’ll focus on how to build efficient Apache Spark™ ETL pipelines for lakehouses where you can effectively ingest and utilize streaming data. Will cover details on:
- Leveraging incrementally processing framework to improve compute efficiency
- Index data to deliver low-latency analytics
- Advanced concurrency control mechanisms to improve throughput
Talk by: Nadine Farah
Here’s more to explore:
Rise of the Data Lakehouse: [ Ссылка ]
Lakehouse Fundamentals Training: [ Ссылка ]
Connect with us: Website: [ Ссылка ]
Twitter: [ Ссылка ]
LinkedIn: [ Ссылка ]
Instagram: [ Ссылка ]
Facebook: [ Ссылка ]
Ещё видео!