# Gemini 3.1 Flash TTS

> **TL;DR.** [Gemini 3.

- **Category:** AI / Foundation Models / Voice
- **Stage:** validating
- **Age:** 49 days
- **Origin date:** 2026-04-15
- **First detected:** 2026-04-16
- **Canonical URL:** https://earlyterms.com/term/gemini-3-1-flash-tts
- **Sources:** 7 primary URLs

## Definition

[Gemini 3.1 Flash TTS](https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-flash-tts/) is Google DeepMind's text-to-speech model that generates expressive speech in 70+ languages, steered by 200+ audio tags plus free-form director's-note prompts (accent, pace, emotion, scene direction). Output is watermarked with SynthID.

Google launched the preview on [April 15, 2026](https://ai.google.dev/gemini-api/docs/models/gemini-3.1-flash-tts-preview) across the Gemini API, AI Studio, Vertex AI, and Google Vids. It hit an Elo of 1,211 on the Artificial Analysis TTS leaderboard and is priced at $1/M text-input tokens and $20/M audio-output tokens — putting it in AA's "most attractive" quality-vs-cost quadrant and undercutting ElevenLabs' Flash/Turbo tier.

## Example

Simon Willison's [hands-on walkthrough](https://simonwillison.net/2026/Apr/15/gemini-31-flash-tts/) shows a character-profile prompt — 'Jaz is from Brixton, London' — producing a London accent, then swapping the line to 'Newcastle' or 'Exeter' visibly shifts the accent without any parameter change. The model supports multi-speaker dialogue natively, so one prompt renders a full two-voice scene.

## Analogy

Think of it as a voice actor you direct with stage notes — you describe the scene, the character, and the accent, and the model plays the part.

## Why it's emerging now

Google DeepMind launched Gemini 3.1 Flash TTS in preview on April 15, 2026 with 70+ languages, 200+ audio tags, native multi-speaker dialogue, and an Elo of 1,211 on the Artificial Analysis leaderboard. Priced at $20/M audio-output tokens, it materially undercuts ElevenLabs' Flash tier while shipping directly into Vertex AI and Google Vids.

## Related terms

- *parent:* Gemini 3.1 Flash
- *parent:* Gemini API
- *related:* Gemini 3.1 Flash Lite
- *related:* Gemini 3.1 Flash Image
- *related:* SynthID
- *competitor:* ElevenLabs Flash
- *competitor:* gpt-4o-audio-preview
- *related:* Claude Opus 4.7
- *related:* audio tags
- *related:* voice cloning

## Sources

1. [Google Blog — Gemini 3.1 Flash TTS launch](https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-flash-tts/)
2. [Gemini API docs — 3.1 Flash TTS preview](https://ai.google.dev/gemini-api/docs/models/gemini-3.1-flash-tts-preview)
3. [Google Cloud — Vertex AI launch post](https://cloud.google.com/blog/products/ai-machine-learning/gemini-3-1-flash-tts-on-google-cloud/)
4. [Simon Willison — hands-on with directed prompts](https://simonwillison.net/2026/Apr/15/gemini-31-flash-tts/)
5. [DeepMind model card — Gemini 3.1 Flash Audio](https://deepmind.google/models/model-cards/gemini-3-1-flash-audio/)
6. [Artificial Analysis — TTS leaderboard entry](https://artificialanalysis.ai/text-to-speech/models/gemini-3-1-tts)
7. [MarkTechPost coverage](https://www.marktechpost.com/2026/04/15/google-ai-launches-gemini-3-1-flash-tts-a-new-benchmark-in-expressive-and-controllable-ai-voice/)

---
_Generated by EarlyTerms · https://earlyterms.com/term/gemini-3-1-flash-tts_