Fast, expressive, open source TTS – authenticated by default.
The only open-source TTS with built-in watermarking. Up to 6× faster than real-time on a GPU. Paralinguistic prompting. Zero-shot cloning. MIT licensed.
TRUSTED BY DEVELOPERS AT
Voice Cloning with 5 seconds of reference audio
Outperforms proprietary closed-source models head-to-head.
Gen Z Girl
Gen Z Girl
Liam Neeson
Liam Neeson
Fast Enough for Real-Time, Trustworthy in Production
Chatterbox Turbo is the first open-source TTS that doesn’t ask you to choose your fighter. It’s fast, expressive and MIT licensed. And every output authenticated with PerTh watermarking, so you can build voice AI that’s both open and accountable.
6× faster than real-time on a GPU
350M Parameters and 75ms Latency
5 seconds of audio for voice cloning
Built for Production, Designed for Developers
Unique Emotion Control
First open source model with emotion exaggeration control. Adjust intensity from monotone to dramatically expressive with a single parameter.
Real-Time Voice Synthesis
Faster than realtime inference time with alignment-informed generation. Perfect for real-time applications, voice assistants, and interactive media.
Zero-Shot Voice Cloning
Clone any voice with just a few seconds of reference audio. No training required. Includes easy voice conversion scripts.
Watermarked & Secure
Built-in watermarking for generated audio. Know when content was created by Chatterbox while maintaining high audio quality.
Paralinguistic Prompting
Text-based tags that tell the model to perform natural vocal reactions in the cloned voice. Supported tags include sigh, gasp, cough and more.
Developer First
Simple pip install, comprehensive docs. Built by developers, for developers. Available on Github and Hugging Face.
Head-to-Head Testing
We conducted a test through Podonos designed to assess the performance of Resemble AI’s Chatterbox Turbo, ElevenLabs Turbo 2.5, Cartesia Sonic 3, and VibeVoice 7B in generating natural and high-quality speech. All systems produce audio clips based on 5 to 10 second long audio clips and identical text inputs (zero-shot, no prompt engineering and audio processing).
Head-to-Head Win Rates
Comparison of preference rates across all matchups.
Paralinguistic Tags More Than Words
Marked by Resemble AI's PerTH Watermarker
Every audio file generated by Chatterbox includes Resemble AI’s PerTh (Perceptual Threshold) Watermarker — a deep neural network watermarker that embeds data in an imperceptible and difficult-to-detect way. This isn’t just a feature; it’s our commitment to responsible AI deployment.
The watermarker operates on principles of psychoacoustics — exploiting the way we perceive audio to find sounds that are inaudible, and then encoding data into these regions.
Ready to Build with Generative Voice?
Join developers already using Chatterbox in production