Delta Live Tables is a new framework available in Databricks that aims to accelerate building data pipelines by providing out of the box scheduling, dependency resolution, data validation and logging.
We'll cover the basics, and then get into the demo's to show how we can:
- Setup a notebook to hold our code and queries
- Ingest quickly and easily into bronze tables using Auto Loader
- Create views and tables on top of the ingested data using SQL and/or python to build our silver and gold layers
- Create a pipeline to run the notebook
- See how we can run the pipeline as either a batch job, or as a continuous job for low latency updates
- Use APPLY CHANGES INTO to upsert changed data into a live table
- Apply data validation rules to our live table definition queries, and get detailed logging info on how many records caused problems on each execution.
By the end of the session you should have a good view of whether this can help you build our your next data project faster, and make it more reliable.
Speaker: Niall Langley [ Ссылка ]
Speaker Blog: [ Ссылка ]
Speaker BIO: Niall has been building data solutions on the Microsoft platform for 12 years. In the past few years Niall has been focused on helping clients with data engineering in Azure.
Niall is active in the data community, helping run the Bristol user group.
[ Ссылка ]
Tags: Azure,Python,Spark,Developing,Data Bricks,Big Data & Data Engineering
Ещё видео!