Deepgram TTS: MP3 vs Opus vs AAC vs WAV (2025 Guide)

September 13, 2025 (1mo ago)

Deepgram’s Text-to-Speech API lets you control the encoding, container, and sample rate of every voice response. That flexibility is powerful but the docs don’t spell out when to use MP3, Opus, AAC, or uncompressed WAV. Here’s the breakdown I rely on when choosing the right format for a product launch, real-time chatbot, or post-production pipeline.

For the full list of available parameters, see the Deepgram Text-to-Speech API Reference Docs.

Quick comparison table

EncodingContainerSample rateBitrateQualityFile sizeLatencyIdeal use caseAPI params
mp3none22050 Hz48 kbps (default)GoodModerateFastGeneral TTS playback?model=voiceModel&encoding=mp3
opusogg48000 Hz (fixed)4 – 650 kbps (configurable)Very goodSmallestFastestReal-time chat, streaming?model=voiceModel&encoding=opus&container=ogg
aacnoneDevice-dependent (typically 48 kHz)4 – 192 kbpsBetter than MP3SmallMediumMobile apps, quality-focused web?model=voiceModel&encoding=aac
linear16wav16000 – 48000 HzUncompressed PCMHighestLargestSlowestAudio analysis, telephony, signal processing?model=voiceModel&encoding=linear16&container=wav&sample_rate=48000

Format breakdown

MP3 (default)

Opus in OGG (my fastest pick)

AAC

Linear16 WAV

Decision guide

Example API snippets

# Default MP3 (well-supported general output)
GET /v1/speak?model={{ voiceModel }}&encoding=mp3
 
# Opus + OGG (low latency, minimal file size)
GET /v1/speak?model={{ voiceModel }}&encoding=opus&container=ogg
 
# AAC (higher quality vs MP3 at similar size)
GET /v1/speak?model={{ voiceModel }}&encoding=aac
 
# Linear16 WAV (raw audio for processing)
GET /v1/speak?model={{ voiceModel }}&encoding=linear16&container=wav&sample_rate=48000

Final thoughts

Deepgram’s defaults already deliver solid TTS, but choosing the right encoding can shave seconds off response time or preserve detail for machine listening. Use Opus when latency matters, MP3 when compatibility matters, AAC when quality-to-size matters, and WAV when signal fidelity is non-negotiable. The right parameter tweak turns a generic voice into a production-ready asset.