GT NLP Seminar is an interactive talk series held bi-weekly, on Fridays 12:30 pm to 1:30 pm, where students/faculty/staff with interest in Natural Language Processing at Georgia Tech meet together, have lunch and listen to talks about recent NLP research in a wide range of topics. Our speakers come from both inside Georgia Tech or outside, and will usually give a 45-minute talk, followed by a 15-minute QA/discussion session.
Learn more about the series: [ Ссылка ]
"Cross-lingual Generalization, Alignment and Applications" is a talk given by Junjie Hu, a PhD student in Language Technologies Institute, School of Computer Science at Carnegie Mellon University (CMU), working with Jaime Carbonell and Graham Neubig.
Abstract:
While text on the web is an invaluable information source, this text is not available in large quantities for most languages in the world. It is even difficult to ask native speakers to annotate text in most languages for training individual machine learning models. With recent advances in multilingual machine learning models, we are able to transfer knowledge across languages in one single model, and apply the model to deal with text written in more than 100 languages. However, a benchmark that enables the comprehensive evaluation of such models on a diverse range of languages and tasks is still missing. In this talk, I will focus on analyzing cross-lingual generalization effects in these models, and propose methods to improve the performance in real applications. Specifically, I will start with introducing Cross-lingual TRansfer Evaluation of Multilingual Encoders (XTREME) benchmark, a multi-task benchmark for evaluating the cross-lingual generalization capabilities of multilingual models across 40 languages and 9 tasks. Secondly, I will show that a compact multilingual model trained on parallel translation text can align multilingual representations, performing on a par with or even better than much larger models on NLP tasks such as sentence classification, and retrieval. Finally, I will present our recent translation initiative for COVID-19, a multilingual translation benchmark in 35 different languages, in order to foster the development of tools and resources for improving access to information about COVID19 in these languages.
Ещё видео!