iFood is the largest food tech company in Latin America. We serve more than 26 million orders each month from more than 150 thousand restaurants. As such, we generate large amounts of data each second: what dishes were requested, by whom, each driver location update, and much more. To provide the best possible user experience and maximize the number of orders, we built several machine learning models to provide accurate answers for questions such as: how long it will take for an order to be completed; what are the best restaurants and dishes to recommend to a consumer; whether the payment being used is fraudulent or not; among others. In order to generate the training datasets for those models, and in order to serve features in real-time so the models’ predictions can be made correctly, it is necessary to create efficient, distributed data processing pipelines. In this talk, we will present how iFood built a real-time feature store, using Databricks and Spark Structured Streaming in order to process events streams, store them to a historical Delta Lake Table storage and a Redis low-latency access cluster, and how we structured our development processes in order to do it with production-grade, reliable and validated code.
About:
Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Read more here: [ Ссылка ]...
Connect with us:
Website: [ Ссылка ]
Facebook: [ Ссылка ]
Twitter: [ Ссылка ]
LinkedIn: [ Ссылка ]
Instagram: [ Ссылка ] Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. [ Ссылка ]
Ещё видео!