Google Gemini is the new multimodal model that redefines artificial intelligence

From the timid beginning to the launch of Gemini: Google's breakthrough in the era of generative AI. Let's discover the particularities of the multimodal chatbot.

Not even a decade ago, tiny bits of machine learning quietly crept into the digital lives of all of us.
We are mostly talking about small “tricks”, such as theidentification of subjects when viewing a camera or the sentence formulations of questionable usefulness.

Today, as we approach a pinnacle of generative artificial intelligence, the rumors about it are becoming increasingly loud; It is in this scenario that Google raises the bar with its new “multimodal” model called Gemini.

Google debuted with Gemini on December 6, 2023, offering it in three sizes: Ultramore powerful, and which for now is held back from widespread commercial use, Pro And Dwarfthe latter dedicated to implementation on mobile devices.

In recent years, the search giant has struggled to respond to the hype around OpenAI, GPT and the potential threats that AI-powered services presented to its core business.

With the ability to manage ahuge amount of information from the Internetusers could get the answers they needed with a single question on a single web page.

Above all, making everything easier and quicker than a Google search.
A thought that raises concern in the Mountain View area, especially considering the numerous glances that could escape the adverts, for which customers pay considerable sums.

Between myths and false gods

Google Gemini logo

To date, the models of Large Language Models or LLM, worked by analyzing input media to expand a certain type of discourse into a given media format.

For example, OpenAI's Generative Pre-trained Transformer or GPT model handles text-to-text exchangeswhile DALL-E translates text prompts into images.
Each LLM would be adjusted for one type of input and one type of output.

Here's where the multimodality talk comes in: Gemini can receive text (including code), images, video, and audio, and, with some direction, return something new in any of these formats.

In other words, a multimodal LLM can theoretically perform the tasks of several dedicated single-disciplinary LLMs.

This presentation gives a nuance of idea how refined interactions can be with a decently trained model of this type.

However, it is worth warning because the video in question, and above all its elegant editing, can easily be misleading.
In reality, none of these interactions happen that quickly as seen on the screen.

As Google also admitted, the video demonstration was not performed in real time with voice suggestions; instead, still frames from the raw footage were used and then text suggestions were inserted to which Gemini responded.

His intent was to showcase Gemini's multimodal capabilities, including its innate ability to make spoken conversational suggestions based on image recognition.

This would constitute a point of substantial divergence in Google's proposal compared to other chatbots.
What's unique about it is the future perspective it offers: the ability for an individual to have a fluid voice conversation with Gemini, observing and getting real-time responses to what's happening in their surroundings.

Small previews

A variant of this model, called Gemini Pro, is out now integrated within the Bard chatbot.

Users in possession of a Pixel 8 Pro, Google's smartphone, can already use a version of Gemini, the Nano, to generate text responses suggested by artificial intelligence on WhatsApp, and soon also on Gboard, the virtual keyboard developed by Californian company.

At the moment, there is only one available in Bard abridged version of Gemini, but it still represents a significant step forward compared to the original Bard, which is limited to text input.

It should be noted that, currently, Gemini is only available in English, but Google plans to introduce support for other languages in the near future.
Similar to Google's previous generative AI updates, Gemini Pro is not yet available in the European Union.

To access Gemini Pro, you need to use a VPN that provides an IP address from a country where Gemini is already usable, such as the United States or Australia, and at that point you just need a Google account.

#Google #Gemini #multimodal #model #redefines #artificial #intelligence