Gemini, the Google Artificial Intelligence Assistant (AI), now incorporates Project Astra’s technology, a platform that allows AI to obtain multimodal context through video. Thanks to this integration, the Chatbot now you can “see” and interpret the information that appears on the screen of a smartphone.
The update allows the smart assistance tool to record the contents visualized on the screen. This allows users realize real -time consultations to Gemini about what they see While navigating the Internet, they play a video game or explore any multimedia material.
The live function of Bot Intelligent has also expanded its capabilities to access the mobile device camera. In this way, users can interact with the assistant based on what the camera captures at the time. These functions are activated by an integrated button in the Gemini interface. At the moment, they are only available in English for a select GEMINI Advanced subscriber group within the Google One Ai Premium plan.
The new improvements are based on Project Astra technology, announced by Google Deepmind last year. This platform was developed with the objective of enabling AI systems to “understand and respond to the complex and dynamic world as people do.” It gives the algorithms the ability to assimilate and remember what they “see and listen”, which allows them to better understand the context and respond more precisely to user requests.
The proposal is based on the most advanced Google AI models and others designed for specific tasks. This combination allows Bots Process the information more quickly by continuous coding of video frames, the integration of video and voice inputs into a temporal line of events, and the storage in cache of this data to facilitate their recovery and subsequent use.
Gemini converts Deep Research’s results into a podcast
On the other hand, Google has incorporated the Overview Audio function to Gemini, allowing users to generate audio summaries (similar to a podcast) From any document or the results obtained through Deep Research, Google’s deep search tool.
The company explains that “Gemini will create a style debate podcast Between two IA presenters that, with a single click, will initiate a dynamic and deep conversation based on the files you upload. They will summarize the material, establish connections between themes, participate in an active exchange and will contribute unique perspectives. “
This new function is available in the version web and in Gemini’s mobile applications for Gemini Advanced subscribers globally. It is expected that the support to more languages will be extended. To convert a Google Deep Research result into a podcastjust select the option “Generate Audio Summary” Under the answer and start listening to the summary content.
The ambitious Gemini updates reflect the growing competition for leadership in the AI of mass consumption, a career promoted by the success of Chatgpt and reinforced by the initiatives of other companies.
Amazon recently announced the launch of Alexa+, an improved version of its assistant, which incorporates AI to hold conversations in natural language, perform multimodal analysis and demonstrate contextual awareness. For its part, Apple has confirmed that it works on a similar update for Siri, although the project has been delayed by technical inconveniences.
#Gemini #sees #smartphone #screen #interacts #content