Meta has created its own modern version of the Tower of Babel with SeamlessM4T, an artificial intelligence model capable of translating and transcribing voice and text in 101 languages. Thus, the dream of the Babel fish, the translator of the famous franchise The Hitchhiker’s Guide to the Galaxycould be closer to becoming a reality.
This technology, developed by Meta, Mark Zuckerberg’s company, owner of Facebook, Instagram and WhatsApp, promises to avert the curse of multilingual communication. According to publication the magazine Naturethe model allows instant translations from speech to speech or from text to speech, and vice versa, imitating the expression and tone of the interlocutors.
SeamlessM4T (Multilingual and Multimodal Machine Translation) outperforms traditional waterfall translation systems by integrating everything into a unified model, improving accuracy by 8% to 23%. Additionally, it is noticeably more robust to background noise and speech variations, with a 50% improvement in its ability to adapt to these challenges.
“We evaluated SEAMLESSM4T for added toxicity and gender bias to assess the safety of the translations. In the case of toxicity, we include two mitigation strategies, which work either in training or at the time of inference,” the post says. ‘Joint voice and text translation for up to 100 languages’from Nature.
The model, led by lead researcher Marta Costa-Jussà, from Meta’s artificial intelligence division (FAIR, Foundational AI Research), was trained using one million hours of open speech audio, allowing it to translate even combinations of languages not explicitly included in their training.
Meta has decided to make the model and its data available to the public for non-commercial use, in order to promote research and development in the field of speech translation.
Despite its progress, SeamlessM4T faces significant challenges. In critical contexts such as medicine and the legal field, where precision is vital, aspects such as the translation of proper names, colloquial expressions and issues related to gender bias and accent recognition still need to be perfected. However, this technology marks a crucial step towards more fluid global communication, maintaining Meta’s leadership in the field of personal communications.
Hours of audio of speeches and human translations
Machine translation has advanced significantly in recent decades, largely thanks to the introduction of neural networks trained with large volumes of data. While there is abundant data for the most widely spoken languages, such as English, it is scarce for many others, which has limited the scope of machine translations. “This affects the languages that are least represented on the Internet,” says Allison Koenecke, a computer scientist at Cornell University in an article published in News & Views.
The Meta team leveraged its previous experience in speech-to-speech translation, as well as a project called No Language Left Behindfocused on offering text-to-text translation for around 200 languages. Through this experience, researchers discovered that making translation systems multilingual can improve performance, even with languages with low data availability, although the reason behind this phenomenon is still unclear.
To train the model, the team collected millions of hours of speech audio, along with their human translations, from the Internet and other sources, such as the United Nations archives. Transcripts of those speeches were also used.
In addition, reliable data was used to teach the model to identify pairs of corresponding content, allowing it to pair around half a million hours of audio with text and automatically associate fragments of one language with its counterpart in others.
#Meta #presents #SeamlessM4T #advanced #model #capable #translating #transcribing #voice #text #languages