To the recent success of Deepseek and its artificial intelligence (AI) is added another Chinese giant. We refer to the Aliexpress parent company, Alibaba, who says “overcome” the recent creation of Liang Wenfeng.
Alibaba Cloud has announced a new artificial intelligence model (AI) called QWen2.5-VLwhich is included in its family of large language models (LLM) Multimodals Qwen and that can analyze documents, understand long -term videos and execute tasks autonomously in ‘smartphones’ and computers.
The Chinese technology company has taken advantage of the interest generated by the Deepseek assistant, launched by the homonymous company and also of Chinese origin, to present a language model that It gathers similar capabilities to this free ‘chatbot’.
In his case, Alibaba Cloud explained that the new Qwen2.5-VL derives from qwen2-vlthat developers have been testing in the last five months and thanks to which they have managed to create a “more useful” language model. In this way, this “makes a significant leap with respect to the previous model” and has improved it by incorporating “powerful document analysis capabilities”, as the company has pointed out in an entry published in Github and its blog.
Language model
More specifically, you can analyze large documents, in several languages, with different orientations of the text and with other integrated elements. For example, Manual text tickets, tables, graphics, chemical formulas and musical scores.
Its General image recognition capabilities, expanding their classification to different categoriesproducts, objects and scenarios, such as plants, animals, monuments or rivers, as well as captures of films and television series.
It can also be used to obtain a Improved precision of absolute coordinates and formats designed for the exchange of data Javacript Object Notation (JSON)which serves as a basis for executing advanced spatial reasoning. In that case, you can detect how many motorcycles are on a road, where they are located and if the drivers carry a helmet, among other options.
This model can also Understand videos “that last hours” and, at the same time, extract scenes segments in a few seconds; And it provides advanced reasoning and decision -making capabilities, enhancing the model with an autonomous agent functionality in ‘smartphones’ and computers. This means that it has a very similar operation to Operator, recently launched by OpenAI.
Chinese model architecture of AI
The developers have advanced other updates of the model architecture, such as this model not only converts images of different sizes in variable lengths dynamically, but also represents coordinates as detection points using the real size scale to the image.
This, in the so -called spatial dimension. In the storm, both the dynamic photograms training per second (SPF) and the Absolute time coding. Thanks to this, the model can learn a sequence and its speed, as well as identify specific moments of a video. On the other hand, training and inference speed has been improved, by implementing the architecture of the vision transformer (Vit) natively.
Finally, they have pointed out that, “in the near future”, They will improve the reasoning and resolution capabilities of the model problem, while incorporating more modalities. Thanks to this, QWen25-VL will be “smarter” and will allow them to reach a complete model that allows them to handle “multiple types of entrances and tasks.”
The QWEN development team has put the QWEN2.5-VL base available to the developers and adapted to three sizes (3b, 7b and 72b) to meet their needs. It can be obtained through Hugging Face and Modescope.
#Gold #Fever #artificial #intelligence #industry #success #Chatgpt #Deepseek #add #Chinese #giant