# DiffusionGemma

> **TL;DR.** DiffusionGemma is a 26B open-weights language model from Google DeepMind that generates text through discrete diffusion rather than sequential token prediction.

- **Category:** AI / Open-Source Models / Inference
- **Stage:** nascent
- **Age:** 2 days
- **Origin date:** 2026-06-10
- **First detected:** 2026-06-12
- **Canonical URL:** https://earlyterms.com/term/diffusiongemma
- **Sources:** 7 primary URLs

## Definition

DiffusionGemma is a 26B open-weights language model from Google DeepMind that generates text through discrete diffusion rather than sequential token prediction. Instead of writing one token at a time, it denoises an entire 256-token block in parallel — a compute-bound operation that matches GPU strengths.

Released June 10, 2026 under [Apache 2.0](https://ai.google.dev/gemma/docs/diffusiongemma), the model builds on the Gemma 4 MoE backbone, activates only 3.8B parameters during inference, fits within 18 GB VRAM when quantized, and reaches 1,000+ tokens per second on a single H100 — making it the first major open dLLM from a tier-one AI lab.

## Analogy

Think of it as a printing press for language: stamp 256 tokens simultaneously instead of typing them one by one.

## Why it's emerging now

Google DeepMind released DiffusionGemma on June 10, 2026 — the first open-weight discrete diffusion LLM from a major lab. NVIDIA simultaneously shipped day-1 support across RTX and DGX platforms. With 1,000+ tokens/second on a single H100 and an Apache 2.0 license, it opens a new design space for local-first, latency-sensitive AI applications that autoregressive models cannot serve.

## Related terms

- *parent:* Gemma 4
- *parent:* gemma-4
- *related:* mtp
- *related:* dgx-spark
- *competitor:* Mercury
- *competitor:* Inception Labs Mercury
- *parent:* discrete diffusion LLM
- *alias:* dLLM
- *related:* vibe-island
- *related:* gemma-4-12b
- *related:* mlx
- *parent:* local AI inference

## Sources

1. [DiffusionGemma: 4x faster text generation — Google Blog](https://blog.google/innovation-and-ai/technology/developers-tools/diffusion-gemma-faster-text-generation/)
2. [DiffusionGemma model overview — Google AI for Developers](https://ai.google.dev/gemma/docs/diffusiongemma)
3. [DiffusionGemma: The Developer Guide — Google Developers Blog](https://developers.googleblog.com/diffusiongemma-the-developer-guide/)
4. [DiffusionGemma: first dLLM natively supported in vLLM — vLLM Blog](https://vllm.ai/blog/2026-06-10-diffusion-gemma)
5. [NVIDIA Day-1 Support for DiffusionGemma across RTX and DGX — NVIDIA Blog](https://blogs.nvidia.com/blog/rtx-ai-garage-local-gemma-diffusion/)
6. [DiffusionGemma: 4x Faster Text Generation — Hacker News discussion (323 pts)](https://news.ycombinator.com/item?id=48478471)
7. [DiffusionGemma — Google DeepMind model page](https://deepmind.google/models/gemma/diffusiongemma/)

---
_Generated by EarlyTerms · https://earlyterms.com/term/diffusiongemma_