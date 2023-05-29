Monterrey.- Meta has created the Massively Multilingual Speech (MMS) Artificial Intelligence (AI) language model, which can recognize more than 4,000 spoken languages, 40 times more than any previously known technology, and produce audio in 1,100 of them.

In a statement, Facebook’s parent company highlighted that many of the world’s languages ​​are in danger of disappearing and the limitations of current speech generation and recognition technology will only accelerate this trend.

“We want to make it easier for people to access information and use devices in their preferred language, and today we’re announcing a number of AI models that could help you do just that.”

There are also many use cases for speech technology, from augmented and virtual reality technology to messaging services, that can be used in one person’s preferred language and can understand everyone’s voice.

“We are opening up our models and code so that others in the research community can build on our work and help preserve the world’s languages ​​and bring the world closer together.”

For this model, Meta’s first challenge was to collect audio data for thousands of languages ​​because the largest speech data sets out there cover a maximum of 100 languages.

To overcome this, the company turned to religious texts, such as the Bible, which have been translated into many different languages ​​and whose translations have been extensively studied for text-based language translation research.

“As part of the MMS project, we created a data set of New Testament readings in over 1,100 languages ​​that provided an average of 32 hours of data per language.”

Thus, by considering unlabeled recordings of other Christian religious readings, Meta was able to increase the number of available languages ​​to more than 4,000.

Going forward, Meta intends to increase MMS coverage to support even more languages ​​and also address the challenge of handling dialects, which is often difficult for existing voice technology.