As part of the course Apache Spark 2 using Python 3, let us understand more about shared variables such as accumulators in this video and broadcast variables, repartition and coalesce in the next one.
itversiry LMS course(CCA 175 Spark and Hadoop Developer – Python – 93 Days Lab):
[ Ссылка ]
Our Udemy Course(CCA 175 - Spark and Hadoop Developer - Python (pyspark)): [ Ссылка ]
Full Playlist of Apache Spark 2: [ Ссылка ]
Following are the topics covered as part of this session:
* Accessing HDFS APIs using sc in Python
* Validating input paths and output paths leveraging HDFS APIs
* Perform join between orders and order_items from HDFS
* Convert data from local file into RDD and then join to get Product Name
* Use accumulators to get number of orders and number of order items processed
* Running on cluster and check the accumulators as part of UI
On our YouTube channel we conduct live sessions regularly. Please do subscribe to get notifications for our live sessions by clicking here.
[ Ссылка ]
For quick itversity updates, subscribe to our newsletter or follow us on social platforms.
* Newsletter: [ Ссылка ]
* LinkedIn: [ Ссылка ]
* Facebook: [ Ссылка ]
* Twitter: [ Ссылка ]
* Instagram: [ Ссылка ]
* YouTube: [ Ссылка ]
#Python #PySpark #Spark2 #itversity #Spark #DataEngineering
Join this channel to get access to perks:
[ Ссылка ]
Ещё видео!