Apache Spark is one of the best available tools to do machine learning at scale. In this talk, we will discuss how we were able to identify anomalies in unstructured time-series plaintext data such as news articles, tweets, and publications. In particular, using an architecture built on top of Spark, we successfully were able to infer insights about timeseries data of interest, including, but not limited to: 1. Generating future predictions (eg., predicting the volatility in commodity prices) 2. Automatedly taking named entities and grouping them into logical groups (eg., Bill Gates and Jeff Bezos group together into Tech CEOs) 3. Identifying shifts in public semantics that pertain to certain entities (eg., articles mentioning Donald Trump have significantly deviated from prior topics, indicating that an event has occurred) 4. Producing an explainable audit of any created predictions or detected semantic shifts (e.g., “articles about vladimir putin use the keywords Chechnya and weapons more often than normal, which is consistent with predicting that oil prices will be chaotic”) So, we are able to combine predictive insights with paradigmatic explanatory information, all at scale, leveraging the processing power of the Spark engine with it’s built-in data science machinery.
About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Read more here: [ Ссылка ]
Connect with us:
Website: [ Ссылка ]
Facebook: [ Ссылка ]
Twitter: [ Ссылка ]
LinkedIn: [ Ссылка ]
Instagram: [ Ссылка ] Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. [ Ссылка ]
Ещё видео!