Building a Distributed Collaborative Data Pipeline with Apache Spark