Microsoft has one of the most advanced artificial intelligence tools, but it poses a risk to everyone, which is why VALL-E 2 will never see the light of day.

The advanced technology is so compelling that the company refuses to share it with the public, citing “potential risks” of misuse. VALL-E 2 is a text-to-speech generator capable of imitating a voice from just a few seconds of audio.

Microsoft trains it to recognize concepts without first feeding it any examples of those concepts, in a scenario called zero-shot learning. It is also the first of its kind to achieve “human parity,” meaning it meets or exceeds human-likeness standards.

According to Microsoft Research, VALL-E 2 can produce “accurate, natural speech in the exact voice of the original speaker, comparable to human performance.” In addition to short phrases, it can synthesize complex sentences.

To do this, AI takes advantage of two functions called: Repetition Aware Sampling and Grouped Code Modeling.

Repetition Aware Sampling addresses the problems posed by repetitive tokens, i.e. the smallest units of data that a language model can process, represented here by words or parts of words.

That is, it prevents sounds or phrases from being repeated during the decoding process, which helps to vary the system’s speech and make it sound more natural.

Microsoft says VALL-E 2 won’t be made public in the near future, as it is considered a purely research project at present.

There are currently no plans to incorporate VALL-E 2 into a product or to expand its access to the public. It may entail potential risks in misusing the model, such as voice ID spoofing or impersonating a specific speaker.

Microsoft says that suspected misuse of VALL-E 2 can be reported via the official site.

Vishing, a combination of voice and phishing, is a type of attack in which scammers impersonate friends, family, or other trusted people over the phone.

