kokoro-tts-82m has become my go-to lightweight voice model for short-form content. The voices are free, expressive, and once you dial in the right speed good enough to skip the paid TTS tiers for many projects. Here’s the cheat sheet I keep on hand, plus two polished scripts (English and Mandarin) that pair perfectly with the model.
Why Kokoro 82M?
- Small but expressive. Despite the 82M parameter size, it nails conversational intonation without the metallic artifacts you hear in older open models.
- Three distinct voices. Between a clean American accent, an ASMR-style whisper, and a lively Mandarin tone, you can cover most short-form formats.
- 1.3x speed sweet spot. Bumping playback to 1.3x retains clarity while sounding punchier on TikTok, Instagram Reels, and Xiaohongshu.
Voice palette
| Voice | Best for | Notes |
|---|---|---|
af_bella | General English narration | Balanced American female voice, nails promo copy and explainer videos. |
af_nicole | ASMR, whisper ads, bedtime content | Softer delivery designed for close-mic storytelling. |
zf_xiaoyi | Mandarin content | Lively female voice with crisp consonants; strongest choice among Kokoro’s Chinese options. |
Recommended generation settings
- Model:
kokoro-tts-82m - Voice parameter: One of the IDs above (
af_bella,af_nicole,zf_xiaoyi). - Speed:
1.3 - Audio length: Under 45 seconds per clip yields the cleanest output and keeps buffer underruns at bay.
Script #1 > English (47s @ 1.3x, af_bella)
I used this script to generate a promotional video for a web app that catalogs foodie review destinations. The energetic tone and conversational language work well for selling a productized service that curates places to eat.
I'll be honest, watching food videos on YouTube is a complete waste of time.
Unless, of course, you actually go to the places. But who actually remembers to save them all?
You watch these amazing reviews, you promise yourself you'll go there one day, but then you're out and about, hungry, and can't remember a single place. So you just end up getting fast food.
That's exactly why I'm obsessed with this web app. It literally catalogs all the places reviewed by your favorite food influencers.
It shows you the best spots within a 60-minute drive from your current location, and the list is always being updated with new videos. You can even filter by cuisine and click to watch the original review!
This app is a total game-changer. I honestly don't know how I lived without it, and everyone I know who's a foodie is using it now.
You seriously have to try it out for yourself. The link is in my bio.Script #2 > English short form (30s @ 1.3x, af_bella)
Great for Reels, Shorts, or lead-in narrations.
Watching food videos on YouTube is a total time-waster unless you actually go to the places. But who remembers them all?
You watch reviews, promise to visit, then forget where when you're hungry and end up with fast food.
That's why I'm obsessed with this app it catalogs places reviewed by your favorite food influencers, shows you the best spots within 60 minutes of your location, and updates constantly. You can filter by cuisine and watch the original reviews.
It’s a game-changer. Everyone I know who's a foodie uses it. You have to try it. Link’s in my bio.Script #3 > Mandarin (64s @ 1.3x, zf_xiaoyi)
Optimized for Xiaohongshu or Douyin. Feel free to pair it with subtitles and vertical footage.
老实说,在 YouTube 上刷美食视频,真的超级浪费时间。看着那些超赞的餐厅,嘴馋是馋了,结果一个都没去过。
有没有这种情况:刷视频的时候疯狂种草,说“下次一定去”,等到真的饿了想吃点好的,却一个地方都想不起来,最后还是随便点个快餐了事。
所以我最近疯狂爱上了一个超实用的美食神器!它会自动整理你关注的美食博主推荐过的所有店。
还能根据你的位置,推荐 60 分钟车程以内的餐厅;支持菜系筛选;点进去还能直接看原视频,回忆种草瞬间。
完全是吃货必备!我现在出门吃饭都靠它找店,朋友们都在用,真的改变我生活的 APP。
一定要自己试试看,链接我放在主页啦,戳一下就能体验!Workflow tips
- Pair with Grok avatars. Generate the voice here, then layer it over stitched Grok clips (see my talking avatar roundup) for hyper-realistic explainers.
- Batch render with n8n. Drop these scripts into an
n8nautomation that hits Kokoro’s API, saves audio to storage, and posts to your content calendar. - Subtitles matter. Export SRT files from your editor or use Whisper to auto-caption the final cut the 1.3x delivery still benefits from readable subtitles.
Final take
Kokoro 82M proves you don’t need massive subscription fees to get professional-grade voiceovers. With three dependable voices and a simple speed tweak, you can crank out multilingual content that sounds like you paid for a studio session. Save the premium credits for projects that demand ElevenLabs-level nuance; for day-to-day social content, Kokoro gets it done.