Scale By the Bay 2019 is held on November 13-15 in sunny Oakland, California, on the shores of Lake Merritt: [ Ссылка ]. Join us!
-----
The Common Crawl corpus contains petabytes of web crawl data and is a treasure trove of potential experiments. To introduce you to the possibilities that web crawl data has for NLP, we will take a detailed look at how the data has been used by various experiments and how to get started with the data yourself.
Stephen Merity is responsible for crawling billions of pages a month at Common Crawl, a non-profit that provides petabytes of web data free of charge. Prior to joining Common Crawl, Stephen worked with Freelancer.com and Grok Learning in Australia. He holds a Masters of CSE from Harvard University and a Bachelors (Honours) from the University of Sydney in NLP.
Ещё видео!