# Nemotron Ultra

> **TL;DR.** Nemotron Ultra is NVIDIA's flagship open-weights large language model — a 550B-parameter hybrid Mixture-of-Experts model with only 55B parameters active per token, engineered for long-running agentic workflows that demand both frontier reasoning and high inference throughput.

- **Category:** AI / Large Language Models / Open Weights
- **Stage:** nascent
- **Age:** 1 days
- **Origin date:** 2026-06-04
- **First detected:** 2026-06-04
- **Canonical URL:** https://earlyterms.com/term/nemotron-ultra
- **Sources:** 7 primary URLs

## Definition

Nemotron Ultra is NVIDIA's flagship open-weights large language model — a 550B-parameter hybrid Mixture-of-Experts model with only 55B parameters active per token, engineered for long-running agentic workflows that demand both frontier reasoning and high inference throughput.

Released June 4, 2026 under the permissive [OpenMDW-1.1 license](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16), the model uses a novel Mamba-2/Transformer/LatentMoE architecture supporting a 1M-token context window. It delivers over 300 tokens per second — roughly 5x faster than comparably-capable open models — and topped US open-weights intelligence rankings on its launch day.

## Analogy

Think of it as a V8 engine that only fires 2 cylinders at a time — massive reserve capacity, everyday efficiency.

## Why it's emerging now

NVIDIA launched Nemotron 3 Ultra on June 4, 2026 as its first open-weights frontier model: 550B parameters (55B active), 1M-token context, 300+ tok/s throughput, and the top US open-weights rank on the Artificial Analysis Intelligence Index. It ships as the fastest open model available for agentic use cases — and it's free to deploy commercially.

## Related terms

- *competitor:* DeepSeek V4
- *competitor:* Kimi K2.6
- *competitor:* Qwen3
- *competitor:* GLM 5.1
- *competitor:* Gemma 4
- *parent:* Llama 3.1
- *parent:* Mixture of Experts
- *related:* long-running agents
- *related:* agentic AI
- *related:* NVIDIA NIM
- *related:* context window
- *related:* DGX Spark

## Sources

1. [NVIDIA Developer Blog — Nemotron 3 Ultra launch post](https://developer.nvidia.com/blog/nvidia-nemotron-3-ultra-powers-faster-more-efficient-reasoning-for-long-running-agents/)
2. [NVIDIA Research — Nemotron 3 Ultra technical overview](https://research.nvidia.com/labs/nemotron/Nemotron-3-Ultra/)
3. [HuggingFace — Nemotron-3-Ultra-550B-A55B-BF16 model card](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16)
4. [Artificial Analysis — Nemotron 3 Ultra launch analysis](https://artificialanalysis.ai/articles/nvidia-nemotron-3-ultra-launch-announced)
5. [ChatForest Builders Log — architecture and builder considerations](https://chatforest.com/builders-log/nvidia-nemotron-3-ultra-550b-moe-open-weights-computex-2026/)
6. [Latent Space — AI News: Cosmos 3, Nemotron 3 Ultra, RTX Spark](https://www.latent.space/p/ainews-nvidia-cosmos-3-nemotron-3)
7. [NVIDIA Newsroom — Nemotron 3 family announcement](https://nvidianews.nvidia.com/news/nvidia-debuts-nemotron-3-family-of-open-models)

---
_Generated by EarlyTerms · https://earlyterms.com/term/nemotron-ultra_
