See the full post here: [ Ссылка ]
This talk covers rapid development of high performance scalable text processing solutions for tasks such as classification, semantic analysis, topic modeling and general machine learning. We demonstrate how Python modules, and in particular the Rosetta Python library, can be used to process, clean, tokenize, extract features, and finally build statistical models with large volumes of text data. The Rosetta library focuses on creating small and simple modules (each with command line interfaces) that use very little memory and are parallelized with the multiprocessing package. We will touch on LDA topic modeling and different implementations thereof (Vowpal Wabbit and Gensim). The talk will be part presentation and part "real life" example tutorial.
ABOUT DATA COUNCIL:
Data Council ([ Ссылка ]) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers. Make sure to subscribe to our channel for more videos, including DC_THURS, our series of live online interviews with leading data professionals from top open source projects and startups.
FOLLOW DATA COUNCIL:
Twitter: [ Ссылка ]
LinkedIn: [ Ссылка ]
Facebook: [ Ссылка ]
Eventbrite: [ Ссылка ]
Ещё видео!