Deepgram TTS: MP3 vs Opus vs AAC vs WAV (2025 Guide)

Deepgram’s Text-to-Speech API lets you control the encoding, container, and sample rate of every voice response. That flexibility is powerful but the docs don’t spell out when to use MP3, Opus, AAC, or uncompressed WAV. Here’s the breakdown I rely on when choosing the right format for a product launch, real-time chatbot, or post-production pipeline.

For the full list of available parameters, see the Deepgram Text-to-Speech API Reference Docs.

Quick comparison table

Encoding	Container	Sample rate	Bitrate	Quality	File size	Latency	Ideal use case	API params
`mp3`	none	22050 Hz	48 kbps (default)	Good	Moderate	Fast	General TTS playback	`?model=voiceModel&encoding=mp3`
`opus`	`ogg`	48000 Hz (fixed)	4 – 650 kbps (configurable)	Very good	Smallest	Fastest	Real-time chat, streaming	`?model=voiceModel&encoding=opus&container=ogg`
`aac`	none	Device-dependent (typically 48 kHz)	4 – 192 kbps	Better than MP3	Small	Medium	Mobile apps, quality-focused web	`?model=voiceModel&encoding=aac`
`linear16`	`wav`	16000 – 48000 Hz	Uncompressed PCM	Highest	Largest	Slowest	Audio analysis, telephony, signal processing	`?model=voiceModel&encoding=linear16&container=wav&sample_rate=48000`

Format breakdown

MP3 (default)

When to use it: Standard playback, podcasts, browser compatibility.
Why it works: Balanced quality at 48 kbps, plays on virtually every device.
Trade-off: Larger than Opus and not as efficient as AAC, but the widest compatibility wins when you don’t control the client.

Opus in OGG (my fastest pick)

When to use it: Real-time agents, streaming experiences, low-bandwidth scenarios.
Why it works: Opus was built for voice. The audio is crisp, file sizes are tiny, and latency stays low during encoding and playback.
Gotcha: Sample rate is locked at 48 kHz. Don’t include sample_rate in your query; it triggers a 400 error.

AAC

When to use it: Native mobile apps or web players that support AAC and need higher quality than MP3 at similar bitrates.
Why it works: Efficient compression without the metallic artifacts you sometimes hear in MP3.
Note: Sample rate is managed internally by Deepgram; no need to override it.

Linear16 WAV

When to use it: Signal processing, telephony integration, or any workflow that requires raw PCM.
Why it works: Zero compression. You get every detail for post-processing, noise analysis, or on-prem speech analytics.
Trade-off: Huge files and slower encode/decode times. Overkill for casual playback.

Decision guide

Simple playback with broad compatibility? Stick to mp3.
Need low-latency streaming or tiny files? Use opus with an ogg container.
Want better compression without losing polish? aac hits the balance for mobile/web apps.
Performing downstream audio analysis? Go linear16 + wav.

Example API snippets

# Default MP3 (well-supported general output)
GET /v1/speak?model={{ voiceModel }}&encoding=mp3
 
# Opus + OGG (low latency, minimal file size)
GET /v1/speak?model={{ voiceModel }}&encoding=opus&container=ogg
 
# AAC (higher quality vs MP3 at similar size)
GET /v1/speak?model={{ voiceModel }}&encoding=aac
 
# Linear16 WAV (raw audio for processing)
GET /v1/speak?model={{ voiceModel }}&encoding=linear16&container=wav&sample_rate=48000

Final thoughts

Deepgram’s defaults already deliver solid TTS, but choosing the right encoding can shave seconds off response time or preserve detail for machine listening. Use Opus when latency matters, MP3 when compatibility matters, AAC when quality-to-size matters, and WAV when signal fidelity is non-negotiable. The right parameter tweak turns a generic voice into a production-ready asset.