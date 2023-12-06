The race for artificial intelligence (AI) has become a speed test. To the advances in ChatGPT, which is now in its fourth version, and the consecutive announcements by large multinationals of their own systems, Google responded this Wednesday with the launch of Gemini, a multimodal artificial intelligence platform that can process and generate text , code, images, audio and video from different data sources. The Ultra version, “available early next year,” as announced by Eli Collins, vice president of products at Google DeepMind, surpasses humans in massive multitasking language understanding (MMLUfor its acronym in English), an assessment benchmark created from 57 science, technology, engineering, mathematics (STEM), humanities and social sciences subjects.

“Gemini is our largest and most capable AI model,” says Collins, who explains that it is “inspired by the way people understand the world and interact with it.” “He is perceived more as a useful collaborator and less as a clever piece of programming,” he says.

During the presentation, Gemini has been able to identify a geometric shape, analyze the formulation to find its area and discover an error in it to propose and explain an accurate result to the problem. In this way, it is capable of returning results from image, alphanumeric text and voice data. It has also identified different shapes and drawings, some based only on scattered points, and proposed uses of the figures or objects presented or developed a story based on alternative proposals or developed updated graphics with the information sought by the platform itself.

According to the vice president of DeepMind, he has obtained more than 90% score in MMLU, the evaluation system for multitasking language understanding. “It is the first AI model to outperform human experts on this industry-standard benchmark,” he says. Gemini has also passed (59.4%) the “understanding of multimodal tasks that include demands that require deliberate reasoning” exam.

More information

Gemini is not an application but the platform to bring this latest artificial intelligence model to existing services, from Bard, Google’s chat competitor to ChatGPT, to the search engine or service managers or Android mobiles or data centers. on a large scale.

For this, three “sizes” of Gemini will be available: the Nano, which can already be used by Android developers; the Pro, which will be available from December 13 and the Ultra, which can be implemented from the beginning of the year on a date yet to be determined. Developers and enterprise customers will be able to access Pro through the Gemini API in Google AI Studio or Vertex AI. Through AICore, Android developers will also be able to create applications with Nano.

bard

Sissie Hsiao, head of assistants and Bard, has announced that Gemini is now joining this latest chat in English for 180 countries and will be extended to the rest of the languages ​​progressively, although she has admitted that they will have to confirm that its development is compatible with the imminent European regulations. on artificial intelligence, which includes these dialogue platforms among its adjustable developments. And with its inclusion in Bard, it will be extended to all supported applications.

The process will be in two phases: the first will use a Pro version, which will provide chat with “more advanced reasoning, planning, understanding and other capabilities,” according to Hsiao; and the second, at the beginning of next year, with improvements that will culminate with the adoption of the Ultra version.

Gemini was born as multimodal, that is, it has not been trained with different modalities of data and the differentiated capabilities have then been unified, but its programming is already based on the diversity of sources. As Collins explains, “This helps Gemini seamlessly understand all types of input much better than existing models and its capabilities are state-of-the-art.”

It is also capable of programming including complex developments. In this sense, Amin Vahdat, vice president at Google Cloud, assures: “In the future, we will see programmers using high-capacity AI models as collaborative tools that help with the entire development process of softwarefrom reasoning about problems to assisting with implementation, performance and capabilities.”

Regarding security, Google says that Gemini passes “the most comprehensive evaluations of any model to date.” The company assures that it has subjected the platform to all risks existing and potential and that maintains a continuous examination that includes “stress tests”. They have also been applied AI principles of the company itself, which establish the ethical standards for its developments.

Despite the advances, Gemini is not infallible, as those responsible recognize. They admit that she will throw up errors and hallucinations (safe-looking responses not justified by data). “We’ve made a lot of progress and Gemini is our best model in that sense, but it is still, I would say, an unsolved research problem,” Collins admits.

You can follow EL PAÍS Technology in Facebook and x or sign up here to receive our weekly newsletter.