Google has introduced a new multilingual text vectorizer called RETVec (short for Resilient and Efficient Text Vectorizer) For help to potentially detect harmful content such as spam and malicious emails in Gmail.

Long story short, Google normally checks content only in English, or at most with translation, in this way it manages to be more “multilingual” without “going through” translation.

How RETVec works

“RETVec has been trained to be resilient against character-level manipulations, including insertions, deletions, typos, homographs, LEET substitutions, and more,” according to description of the project on GitHub; subsequently adding “The RETVec model is trained on an innovative character encoder that It can encode all UTF-8 characters and words efficiently.”

While huge platforms like Gmail and YouTube rely on text classification models to spot phishing attacksinappropriate comments, and scams, threat actors are known to devise counter-strategies to evade these defense measures.

Cybercriminals have been observed resorting to text manipulationranging from the use of homographs to keyword greed to the presence of invisible characters.

RETVecwhich works on over 100 languages ​​“out-of-the-box” and aims to help build more resilient and efficient text classifiers on the server side and on devicesas well as being more robust and efficient.

Vectorization is a methodology in natural language processing (NLP) to map words or phrases from the dictionary to a corresponding numerical representation in order to perform further analysis, such as sentiment analysis, text classification and named entity recognition.

“Because of its innovative architecture, RETVec works out of the box with any language and all UTF-8 characters without the need for text preprocessingmaking him the ideal candidate for text classification implementations on devices, [anche via] web and on a large scale,” they have declared Google’s Elie Bursztein and Marina Zhang.

The tech giant stated that Vectorizer integration into Gmail improved spam detection rate over baseline by 38% and reduced the false positive rate by 19.4%. It also reduced the use of tensor processing units (TPU) of the 83% model.

“Models trained with RETVec show faster inference speed due to its compact representation. Having smaller models reduces computational costs and decreases latency, which is critical for large-scale applications and device models,” Bursztein and Zhang later added.

Spam: a never-ending struggle?

Despite the significant progress achieved through innovative technologies such as RETVec, the challenge against spam appears to be an ever-evolving undertaking.

Hackers continue to develop new tactics to evade defenses, pushing platforms like Gmail to constantly roll out new solutions to stay ahead in the battle against harmful content; the ongoing search for more effective ways to combat spam reflects the complexity of the digital landscape and the constant need for technology companies to adapt.

Unfortunately, spam is a major carrier of phishing and malware and much more and for inexperienced users it is very easy to fall into the network; Unfortunately, although this type of protection is increasingly sophisticated, in the end there are always problems between the chair and the keyboard (or between the eyes and the smartphone, nowadays).

You must therefore remain vigilant when opening your email and don’t click randomly.