Meta gets a voice translator in 100 languages ​​and brings us closer to the myth of the Babel Fish

Having a text translator available for up to 100 languages ​​is impressive, but being able to translate directly from voice to voice is something that until now was only seen in science fiction series and movies.

The utopia of the universal translator, a hitherto fictitious device, served in the fictional universe of Star Trek to understand with aliens. in the novel Hitchhiker’s Guide to the Galaxyby Douglas Adams, the Babel Fish was a biological device in the shape of a fish that was placed in the ear and was capable of translating any language in real time. Today the myth is one step closer to becoming reality.

Meta, Mark Zuckerberg’s company, has published a article in the magazine Nature where it presents an artificial intelligence system capable of performing translations in multiple languages, both from and to text and from and to audio, as well as all their combinations.

Automatic translation before Meta

Those first automatic translation systems based on rules and statistical calculations prior to the leap to automatic neural translation back in 2016. Today we have great computing power that we can take advantage of to unleash machine learning (machine learning) through artificial neural networks applied to the creation of large linguistic models. Those same ones that are the basis of our beloved (and sometimes hated) ChatGPT.

Until now, most machine translators translate from language X to language Y using an intermediate language where there is a lot of data to train these systems. And yes, as expected, most translation systems use English as an intermediary. This is logical, since if we think that we have 100 languages ​​and we want to translate from all of them to all of them, we would need 19,800 translators (100 => 99 and vice versa). That is, you have to combine all of them with all of them. However, if we use English as an intermediate language, many steps are saved and we will only need 198 (99 => English + English => 99).

The problem is that this use of an intermediate language requires two translation steps (from the source language to English and from English to the target language), which possibly entails the commission of many errors.

Automatic translation today

Meta’s proposal is to carry out direct translations between two languages ​​thanks to the use of a common representation space. That is, the text (or audio) is converted into a series of numerical values ​​that represent them, so that a machine can process them.

In this multidimensional space, sentences with similar meanings will be close to each other, so that it is possible to measure distances and perform calculations in this area. What’s interesting is that the Meta system is able to learn how to represent text and audio in that space regardless of what language it is in.

Imagine a multidimensional space where different sentences are organized according to their similarity to many others. A sentence in one language and its translation into another will be very close to each other; almost overlapping, we could say.

Thanks to this data preprocessing, there is really no need to create new artificial neural networks that are much more complex than those we already have at our disposal. It’s all about using the available information intelligently.

Thanks to this, it is possible to perform some tasks, such as text-to-speech translation for some languages ​​where it is not possible to have sufficient training data. That is, if we know how to translate text to text from language X to language Y, but we do not have examples of translations from text from language

Learning from 0 examples

This is achieved through the use of a learning technique called zero-shotsomething like “zero example learning.” Since both text and audio are represented in the same multidimensional space, it is possible to jump between one and the other.

Someone might object that text-to-speech is something solved and that there are many programs that are capable of doing an acceptable job. However, if I want to translate from speech to speech and decompose the problem into steps (voice in language long delay and ends up being unusable in a real case. Being able to carry out the entire process in a single step makes the translation fluid.

Machine translation in the future

Despite all these advances, machine translation cannot be considered a solved problem. There are many elements that are not being considered, such as vocal inflections or other emotional components that can affect the accuracy of the final translation, especially if a voice-to-speech translation is used.

It is also true that there may be problems when determining the grammatical gender of some words – such as, for example, professor, which in English does not have a gender and in Spanish it does – since there is an overgeneralization towards a specific gender.

But the most complex thing to solve is the lack of quality data to train these highly advanced artificial intelligence systems. Therefore, translation between minority languages ​​(such as Zulu or Nyanja) is still a great challenge. The automatic translators that will come in the future will have to take all this into account and also be fast and energy efficient to be able to incorporate them into our mobile devices.

Although it may seem like translating between 100 languages ​​is something incredible, we are only talking about a small portion of the languages ​​spoken in the worldwhich exceed 7,000. However, the goal (and this goal is with a lower case letter) of building the Tower of Babel seems to be a little closer every day.

This article was originally published in The Conversation. You can read it here.

#Meta #voice #translator #languages #brings #closer #myth #Babel #Fish

Next Post

Leave a Reply

Your email address will not be published. Required fields are marked *

Recommended