Introducing Mercury: The Game-Changer in Diffusion LLMs for Fast, Mobile Deployments!

Mercury is the first-of-its-kind commercially available diffusion LLM, fast and mobile deployable.

Feb 28, 2025

Introducing Mercury: The Game-Changer in Diffusion LLMs for Fast, Mobile Deployments!

AI technology

Something new and unprecedented is quietly emerging in the field of artificial intelligence. Recently, Inception Labs announced the launch of the Mercury series of diffusion large language models (dLLMs), a new generation of language models designed for fast, efficient, and high-quality text generation. Compared to traditional autoregressive large language models, Mercury boasts upgraded features, including up to a 10x speed improvement, achieving over 1000 tokens per second on an NVIDIA H100 GPU – a speed previously only achievable with custom chips.

Boosted Performance

The Mercury Coder, which doubles as the first product in the Mercury series, has already debuted in public testing. This model focuses on code generation, demonstrating exceptional performance, and surpassing many existing speed-optimized models like GPT-4o Mini and Claude3.5Haiku in multiple programming benchmarks. It does all this and is still nearly 10 times faster.

Developer feedback indicates that Mercury's code completion is highly favored; in Copilot Arena testing, Mercury Coder Mini ranked among the top performers and was one of the fastest models.

Innovative Changes and Enhancements

Current language models mostly employ an autoregressive approach, generating tokens sequentially from left to right. This inherently sequential process leads to higher latency and computational costs. Mercury works differently by utilizing a "coarse-to-fine" generation method, starting from pure noise and iteratively refining the output through several "denoising" steps. This allows the Mercury model to perform parallel processing of multiple tokens during generation, resulting in improved reasoning and structured response capabilities.

With the launch of the Mercury series, Inception Labs has displayed the immense potential of diffusion models in text and code generation. The company plans to introduce language models for chat applications next. This further expands the application scenarios of diffusion language models.

The new models will feature enhanced intelligent agent capabilities, enabling complex planning and long-form generation. Their efficiency also allows them to run smoothly on resource-constrained devices such as smartphones and laptops.

The introduction of Mercury is a significant advancement in AI technology, offering substantial improvements in speed and efficiency, while also providing higher-quality solutions for the industry.