In this video, we will cover the exciting world of data-lake. Data Lake is an essential component of Modern Data Stack. We have developed a Data Lake in AWS environment using AWS S3, Glue and Athena. What if we want to deploy our own data lake with open source tools on our infrastructure? We will deploy an on-premise data lake using open source technologies. This way we can learn the technologies behind data lake and most of the cloud offering use the same technologies.
What is Data Lake? [ Ссылка ]
Link to GitHub repo: [ Ссылка ]
💥Subscribe to our channel:
[ Ссылка ]
📌 Links
-----------------------------------------
#️⃣ Follow me on social media! #️⃣
🔗 GitHub: [ Ссылка ]
📸 Instagram: [ Ссылка ]
📝 LinkedIn: [ Ссылка ]
🔗 [ Ссылка ]
-----------------------------------------
#dataanalytics #datalake #opensource
Topics covered in this video:
==================================
0:00 - Introduction to Data Lake
1:36 - Tech Stack of on-premise Data Lake
1:49 - Docker Containers Overview
3:26 - Data Lake Configurations
4:48 - Start Docker Containers
5:59 - MinIO (S3) Bucket and File(s)
6:53 - File mapping to SQL Table
7:12 - Trino Cluster
7:32 - Trino SQL Engine Connection
8:37 - Create Schema
9:03 - Create Table
9:36 - Query External Table
10:12 - SQL Analysis
10:29 - Data Lake Tech Review
11:51 - Coming Soon
Ещё видео!