Chatterbox Turbo

Fast, expressive, open source TTS – authenticated by default.

The only open-source TTS with built-in watermarking. Up to 6× faster than real-time on a GPU. Paralinguistic prompting. Zero-shot cloning. MIT licensed.

TRUSTED BY DEVELOPERS AT

Developers at Age of Learning trust Resemble AI
Developers at Red Games trust Resemble AI
Developers at Netflix trust Resemble AI

Voice Cloning with 5 seconds of reference audio

Outperforms proprietary closed-source models head-to-head.

Fast Enough for Real-Time, Trustworthy in Production

Chatterbox Turbo is the first open-source TTS that doesn’t ask you to choose your fighter. It’s fast, expressive and MIT licensed. And every output authenticated with PerTh watermarking, so you can build voice AI that’s both open and accountable.

6× faster than real-time on a GPU

350M Parameters and 75ms Latency

5 seconds of audio for voice cloning

Built for Production, Designed for Developers

j

Unique Emotion Control

First open source model with emotion exaggeration control. Adjust intensity from monotone to dramatically expressive with a single parameter.

Real-Time Voice Synthesis

Faster than realtime inference time with alignment-informed generation. Perfect for real-time applications, voice assistants, and interactive media.

Zero-Shot Voice Cloning

Clone any voice with just a few seconds of reference audio. No training required. Includes easy voice conversion scripts.

Watermarked & Secure

Built-in watermarking for generated audio. Know when content was created by Chatterbox while maintaining high audio quality.

Paralinguistic Prompting

Text-based tags that tell the model to perform natural vocal reactions in the cloned voice.  Supported tags include sigh, gasp, cough and more.

Developer First

Simple pip install, comprehensive docs. Built by developers, for developers. Available on Github and Hugging Face.

Head-to-Head Testing

We conducted a test through Podonos designed to assess the performance of Resemble AI’s Chatterbox Turbo, ElevenLabs Turbo 2.5, Cartesia Sonic 3, and VibeVoice 7B in generating natural and high-quality speech. All systems produce audio clips based on 5 to 10 second long audio clips and identical text inputs (zero-shot, no prompt engineering and audio processing).

Head-to-Head Win Rates

Comparison of preference rates across all matchups.

80% 60% 40% 20% 0%

Paralinguistic Tags More Than Words

Voice AI that sounds human, complete with reactions and emotions expressed in sighs, laughs and more.

Chatterbox Turbo introduces paralinguistic prompting: text-based tags that tell the model to perform natural vocal reactions in the cloned voice.

The model performs these reactions naturally, in the same cloned voice, with the same emotional tone—no post-processing, no splicing, no manual audio editing.

paralinguistic prompting: text-based tags that tell the model to perform natural vocal reactions in the cloned voice.<br />

Marked by Resemble AI's PerTH Watermarker

Every audio file generated by Chatterbox includes Resemble AI’s PerTh (Perceptual Threshold) Watermarker — a deep neural network watermarker that embeds data in an imperceptible and difficult-to-detect way. This isn’t just a feature; it’s our commitment to responsible AI deployment.

The watermarker operates on principles of psychoacoustics — exploiting the way we perceive audio to find sounds that are inaudible, and then encoding data into these regions.

Learn how AI watermarking works

Ready to Build with Generative Voice?

Join developers already using Chatterbox in production