Deepseekthe Chinese company responsible for the ‘chatbot’ of the same name that has unbalance chatgpt in the App Store, has launched Janus Pro, a new image generation model that improves multimodal understanding and stability of the generation of image to text for text offer “highly realistic” results and in detail despite its low resolution.
Janus Pro It is the last member of the family of Janus generative models, an improved version that, as the company explains in the Github repository, “incorporates an optimized training strategy, expanded training data and scalability to a larger model size.”
At its base is the Deepseek VL2 visual language model, with 4,500 million activated parameters. According to the company, “it achieves competitive or latest generation performance with similar or minor activated parameters compared to dense and open -source -based models existing.”
Janus Pro is offered in two sizes, with one billion parameters (1b) and 7,000 million parameters (7b). The latter offers a better multimodal understanding and improves the stability of the text generation in the image.
Specifically, in multimodal understanding, its creators say that it exceeds Tokenflow XL (13b)something that they attribute to “the dissociation of visual coding for multimodal understanding and generation, which mitigates the conflict between these two tasks.”
Regarding the generation of text in the image, in the tests Geneval and DPG-Bench, Janus Pro 7B It shows a general precision of 80 percent in the first evaluation, where it exceeds Dall-E 3 (67%), as a result of its approach to the capabilities of following instructions. In DPG-BENCH reaches a score of 84.19.
Deepseek also highlights the quality of the results: “highly realistic” images that contain great detail despite the resolution of 384 x 384 pixelsalthough this low resolution is still a limitation of the model.
Deepseek has popularized this Monday after its’ chatbot, of the same name, reached first place in free applications downloads in the App Store. At its base is the Deepseek V3, which has been trained with 2,048 GPU NVIDIA H800 and a cost of 5.6 million dollars and offers similar performance or higher than the avant -garde models, such as Claude 3.5 Sonnet, call 3.1 40B and GPT 4O.
The Chinese firm recently launched another family of reasoning models, Depseek-R1-Zero and Deepseek-R1. The latter, and according to the company, with the capacity to achieve “a performance in reasoning tasks comparable to OpenAi O1.”
#Deepseek #launches #image #generator #knock #Midjourney #Microsoft #Google