Building data pipelines with #python is an important skill for data engineers and data scientists. But what's the best library to use? In this video we look at three options: pandas, polars, and spark (pyspark).
Timeline:
00:00 Data Pipelines
01:11 The Data
02:32 Pandas
04:34 Polars
06:15 PySpark
09:15 Spark SQL
Follow me on twitch for live coding streams: [ Ссылка ]_
My other videos:
Speed Up Your Pandas Code: [ Ссылка ]
Intro to Pandas video: [ Ссылка ]
Exploratory Data Analysis Video: [ Ссылка ]
Working with Audio data in Python: [ Ссылка ]
Efficient Pandas Dataframes: [ Ссылка ]
* Youtube: [ Ссылка ]
* Discord: [ Ссылка ]
* Twitch: [ Ссылка ]_
* Twitter: [ Ссылка ]
* Kaggle: [ Ссылка ]
#python #polars #spark #dataengineering
The BEST library for building Data Pipelines...
Теги
data sciencedata pipelinebig datahow to build a data pipelinedata pipelinesdata analyticsapache sparkspark sqlpandaspandas pythonpolarspolars data sciencedata science pipelinesdata processingspark vs polarspolars vs sparkdata engineeringrob mulladata engineering pipelinesdata engineering tutorialsdata pipeline architecturedata warehousebig data engineerdata pipeline using spark