Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu