There are over 7100 languages spoken by humans around the world, yet the vast majority of language models only support the English Language.
This makes it incredibly challenging to build products and projects using multilingual language understanding. In this talk, David addresses the challenges faced in NLP research and development for African Languages, which are spoken by over a billion people.
David will also share his findings of human-annotated named entity recognition (NER) datasets and the development of Multilingual pre-trained language models (PLMs) for 20 widely spoken languages in Africa through multilingual adaptive fine-tuning (MAFT).
==
Check out David’s work here- [ Ссылка ]
Follow him here- [ Ссылка ]
How to join Masakhane, [ Ссылка ]
==
Join the Cohere Discord: [ Ссылка ]
Discussion thread for this episode (feel free to ask questions):
[ Ссылка ]
==
0:00 Introducing David Adelani
2:42 Progress of Language Technology in English
4:57 When Language Technology is Needed Urgently
5:52 Why Research on Other Languages
7:31 Not Many Languages Benefit from NLP
8:39 What are Under-resourced Languages
13:00 NLP for Under-resourced African Languages
15:37 Challenges for NLP in African Languages
18:50 Developing labelled datasets for African Languages
21:31 About The Masakhane Research Community
22:35 MasakhaNER - Named Entity Recognition
31:11 Improving Pre-trained Language Models: Language Adaptive Fine-tuning (LAFT)
40:48 Multilingual Adaptive Fine-Tuning (MAFT)
52:12 Conclusion
54:00 Masakhane and how people can get involved
58:45 Applying these techniques to low-resourced Asian languages
Ещё видео!