A multilingual replicable instruction-following model trained with 3.4M instructions from 52 languages, offering 52 monolingual models and one multilingual model.


This GitHub project presents a groundbreaking advancement in natural language processing and machine learning, developed by the NLP department of Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI). The core of this project is a sophisticated instruction-following model that has been meticulously trained on an extensive dataset comprising over 3.4 million instructions. These instructions span a diverse range of 52 languages, showcasing the model’s comprehensive multilingual capabilities.

The project offers a total of 53 distinct models: 52 of these are monolingual models, each finely tuned to understand and process instructions in a specific language with high accuracy. This allows for tailored applications in a wide variety of linguistic contexts, making the technology highly versatile and adaptable to global needs. Additionally, there is one multilingual model that encapsulates the ability to understand and follow instructions across all 52 languages included in the training set. This model represents a significant leap in creating AI systems that can seamlessly interact with users in multiple languages without the need for language-specific models or the complexities of switching between them.

Developed by the NLP department at MBZUAI, a leading institution in AI research and education, this project is a testament to the cutting-edge work being done in the field of artificial intelligence. The development of such a model not only pushes the boundaries of what’s possible in machine learning and natural language processing but also opens up new avenues for applications in international communication, automated translation services, AI-driven assistance, and more, catering to a global audience with unprecedented linguistic diversity and flexibility.

