Surprise! DeepSeek Launches the New Multimodal Model, Janus-Pro in the Late Night

DeepSeek releases the Janus-Pro multimodal large model, outperforming many popular models

Jan 28, 2025

Surprise! DeepSeek Launches the New Multimodal Model, Janus-Pro in the Late Night

In case you missed it, the domestic large model DeepSeek has launched the brand new Janus-Pro multimodal large model. This means it has officially entered the text-to-image field. This represents a significant breakthrough for DeepSeek in multimodal AI technology.

To ascertain its strength, GenEval, and DPG-Bench benchmark tests were conducted, with the Janus-Pro-7B surpassing OpenAI's DALL-E3. It also outperformed popular models, such as Stable Diffusion and Emu3-Gen. Janus-Pro is released under the MIT open-source license, meaning using it comes without restrictions in commercial scenarios.

According to DeepSeek, the Janus-Pro is an advanced version of the JanusFlow large model released on November 13, 2024. Compared to its predecessor, the Janus-Pro has optimized training strategies, expanded training data, and increased model size. With these improvements, Janus-Pro has made significant progress in multimodal understanding and text-to-image instruction tracking. Also, it has enhanced the stability of text-to-image generation.

Currently, Janus-Pro can only handle images at a resolution of 384x384, but this figure and quality are impressive, considering its compact model size. As a multimodal model, it can generate images, describe them, identify landmarks, and recognize text within images. Besides, it's capable of providing information about the knowledge depicted in the images.