Resemble Text-to-Speech

Generate natural speech across 100 languages, in milliseconds.

Chatterbox delivers real-time speech synthesis, custom pronunciation, and PerTH watermarking. Available on cloud, on-premise, or air-gapped.

Trusted by
ModelS
Chatterbox
MIT OPEN SOURCE
Production-grade TTS with zero-shot voice cloning. Outperforms ElevenLabs in blind evaluations. First open-source model with emotion exaggeration control.
Sub-200ms
Voice cloning
Emotion control
PerTh watermarked
Chatterbox Turbo
NEW
350M parameter architecture optimized for voice agents. 1-step decoder, native paralinguistic tags — [cough], [laugh], [chuckle]. Built for latency-critical production.
Real-time
350M params
Paralinguistic
ONNX available
Multilingual
23-language TTS with voice cloning across Arabic, Chinese, German, French, Hindi, Japanese, Korean, Spanish, and 15 more. MIT licensed.
23 languages
Voice cloning
MIT licensed
Chatterbox Pro
ENTERPRISE
Enterprise tier with custom fine-tuning on brand vocabulary, SLAs, guaranteed uptime, sub-200ms streaming, and advanced watermarking and detection.
Custom fine-tuning
SLA
On-prem
23 languanges
RESEMBLE TTS PRODUCTION COVERAGE

Can generic TTS handle your edge cases? Chatterbox does.

Drug name or brand terminology
PRONOUNCED CORRECTLY
PRONOUNCED CORRECTLY
Emotion and vocal character
CONSISTENT OUTPUT
CONSISTENT OUTPUT
Live agent response at 200ms TTFS
NO PERCEPTIBLE DELAY
NO PERCEPTIBLE DELAY
Deployed on your own infrastructure
FULL CONTROL
FULL CONTROL
Voice clone across 23 languages
ACCENT RETAINED
ACCENT RETAINED
Sighs, laughs, and gasps built in
NO POST-PROCESSING
NO POST-PROCESSING
OUR CAPABILITIES

Speech that responds instantly, pronounces correctly, and sounds human.

The details that break generic TTS are the ones we designed around.
Real time streaming
Delivers speech via WebSocket at 200ms TTFS for conversational agents, HTTP streaming for longer-form content, and synchronous responses for notifications. One API, with three modes matched to how your application works.
Custom pronounciation
Generates multiple pronunciation variants for any term and lock in the best one. The approved pronunciation applies across every voice, session, and API call automatically.
Emotion and paralinguistic control
Adjusts emotion intensity from flat delivery to dramatically expressive with a single parameter. Chatterbox Turbo adds natural vocal reactions inline — sighs, laughs, coughs, gasps without post-processing.
200ms
Time to first speech with Chatterbox Turbo
10M+
Hugging Face downloads, including 2.2M in a single month
100
Languages and regional dialects supported on the managed platform
Resemble tts use cases

Built for industries where the voice has to be right.

Standard TTS handles the easy cases. These don't have room for error.

Voice agents & IVR
The problem:
Generic TTS introduces perceptible delay and flat delivery that breaks conversational trust.
Resemble AI solution:
200ms TTFS via WebSocket. Natural vocal reactions built in. Agents sound human.
Contact centers
Industry risk:
Multiple TTS vendors per language, each with different latency and quality profiles.
Resemble AI solution:
One API. 100 languages. Single integration point for the entire contact center stack.
Developer platforms
The problem:
Vendor lock-in and data sovereignty concerns block enterprise AI voice deployments.
Resemble AI solution:
MIT-licensed. Deploy cloud, on-premise, or air-gapped. No vendor lock-in.
Media & Localization
The problem:
Voice clones lose accent and character when translated across languages.
Resemble AI solution:
Zero-shot cloning retains vocal identity across 23 languages from 5 seconds of audio.
Healthcare & HealthTech
The problem:
Drug names, fund tickers, and legal citations mispronounce on every generic engine.
Resemble AI solution:
Lock the correct pronunciation once. It applies across every voice and API call.
Regulated industries
The problem:
AI-generated audio output not traceable for compliance or governance review.
Resemble AI solution:
Every file can add a PerTh neural watermark. Provenance is verifiable at any point.
SOC2 Type II
Enterprise plans
EU AI Act ready
Mandatory Aug 2026
GDPR Compatible
On-prem available
HIPAA Compatible
Air-gapped deployment
SSO / SAML
Enterprise identity
C2PA Standard
Content provenance
TRUSTED IN PRODUCTION

Trusted where it matters most

When the stakes are real, rely on Resemble AI
AI Voice Reconstruction
AI narration for The Andy Warhol Diaries, generated from three minutes of source audio.
Personalized TTS
354,000 personalized audio messages generated. 7x revenue impact on fan engagement.
Multilingual Audio
Production-grade voice generation at scale for educational content across multiple languages.
Consumer Voice Cloning
Parents record 25 sentences. Resemble clones the voice. Bedtime stories narrated in the parent's own voice. 4.8 App Store rating.
INTEGRATIONs AND DEPLOYMENTs

Go live in hours, not sprints.

Rest API
Python SDK
Node.js SDK
JavaScript SDK
MCP for Cursor & Claude Code
On-prem and air-gapped environments available
WebSocket streaming
Frequently asked questions
Does TTS affect audio quality at production grade?
No. Quality metrics remain consistent across cloud API, on-premise, and open-source deployments.
What languages are supported?
100 languages and regional dialects on the managed platform. 23 languages with full zero-shot voice cloning via Chatterbox Multilingual: English, Spanish, French, German, Arabic, Portuguese, Russian, Turkish, Italian, Danish, Finnish, Japanese, Korean, Mandarin, Dutch, Slovak, Swedish, Vietnamese, Norwegian, Polish, Swahili, Hindi, and Hebrew.
How does custom pronunciation work?
Submit a term through the vocabulary engine and it returns multiple pronunciation variants. Select the correct one in one click. The locked pronunciation applies across all voices, sessions, and API calls. No phonetic transcription required.
Can I clone a voice without training data?
Yes. 5 seconds of reference audio, no training. Accent and vocal character retained across all 23 languages.
IIs the model truly open-source?
Yes. Chatterbox is MIT-licensed on GitHub and Hugging Face. Run it on your own infrastructure, audit the weights, keep your data sovereign.
What are the enterprise deployment options and SLAs?
Cloud API, on-premise, and air-gapped environments. Chatterbox Multilingual Pro includes sub-200ms latency SLAs, guaranteed uptime, and custom fine-tuning on brand vocabulary. SLA documentation available on request.
How quickly can we integrate?
Most teams are live within a day using the REST API or an available SDK. Open-source deployment via Hugging Face requires no API key.
Get complete generative AI security
Join thousands of developers and enterprises securing with Resemble AI