Chatterbox Multilingual · Open Source TTS

Hollywood-quality voice synthesis, now in 23 languages.

Chatterbox Multilingual is our production-grade, open-source TTS model with expressive emotion control, PerTh watermarking, and zero-shot voice cloning — free to download, free to build on.

Use Chatterbox Multilingual → Get on GitHub

Languages supported at launch

MIT

Permissive open-source license

Zero-shot

Voice cloning from a short sample

Trusted by the open-source community

10M+

Hugging Face downloads

24K+

GitHub stars

63.75%

Preferred over ElevenLabs in blind tests

Languages from day one

Capabilities

State-of-the-art voice cloning in 23 languages

Users expect apps and agents to sound human, speak in their native language, and deliver content with authentic tone and emotion. Chatterbox Multilingual was built to meet that demand.

🌐

Breadth of languages

23 supported languages from launch — Arabic, Chinese, English, French, German, Hindi, Japanese, Korean, Spanish and more — with zero-shot voice cloning across every one.

🎤

Expressive control

Fine-tune delivery with emotion and intensity settings. Dial in warmth, urgency, or calm to match the moment — not just the text.

🛡

Enterprise reliability

Ultra-stable inference with PerTh watermarking baked into every output, so every voice you generate is traceable and authenticated at creation.

Languages

All 23 languages, day one.

Whether you’re designing a voice agent for customer support, a language-learning app, or a global gaming experience, Chatterbox Multilingual ships with native support for the languages your users actually speak.

🇦🇷

Arabicar

🇩🇰

Danishda

🇩🇪

Germande

🇬🇷

Greekel

🇬🇧

Englishen

🇪🇸

Spanishes

🇫🇮

Finnishfi

🇫🇷

Frenchfr

🇮🇱

Hebrewhe

🇮🇳

Hindihi

🇮🇹

Italianit

🇯🇵

Japaneseja

🇰🇷

Koreanko

🇲🇾

Malayms

🇳🇱

Dutchnl

🇳🇴

Norwegianno

🇵🇱

Polishpl

🇵🇹

Portuguesept

🇷🇺

Russianru

🇸🇪

Swedishsv

🇰🇪

Swahilisw

🇹🇷

Turkishtr

🇨🇳

Chinesezh

Samples

Hear it in action

A handful of multilingual samples generated directly from Chatterbox Multilingual — same model, same voice prompt, four different languages.

🇮🇳 Hindi

🇩🇪 German

🇪🇸 Spanish

🇸🇦 Arabic

Generate your own samples in the Resemble app or run the model locally from Hugging Face.

For developers

Six lines of code. Twenty-three languages.

Pull the weights straight from Hugging Face and generate production-quality speech in any supported language. No license key, no rate limits, no platform lock-in.

Open source. MIT licensed. Production-ready.

Chatterbox Multilingual ships as a standard Python package with first-class PyTorch and torchaudio support. Run it on your own GPU, deploy it on your own infra, or layer it into an existing pipeline — the only constraint is your imagination.

Built-in PerTh watermarking means every clip you generate is invisibly tagged at creation, so downstream detection stays possible even after re-encoding.

View on GitHub Hugging Face →

chatterbox_fr.py

import torchaudio as ta
from chatterbox.tts import ChatterboxTTS

model = ChatterboxTTS.from_pretrained(
    repo_id="ResembleAI/chatterbox-multilingual",
    device="cuda"  # or "cpu"
)

text = "Bienvenue dans Chatterbox Multilingual."
wav = model.generate(text, lang="fr")
ta.save("sample_fr.wav", wav, model.sr)

Open source & Pro

Start free. Scale when you’re ready.

The open-source release brings world-class TTS to everyone. For regulated industries and production workloads, Chatterbox Multilingual Pro closes the last mile with fine-tuning, SLAs, and low-latency infrastructure.

Open source

Chatterbox Multilingual

MIT licensed · free forever

23 languages with zero-shot voice cloning
Emotion and intensity controls
PerTh watermarking at creation
Run locally on CPU or GPU
Full model weights on Hugging Face

Get on GitHub →

Pro

Chatterbox Multilingual Pro

For enterprises shipping at scale

Custom fine-tuning on brand vocabulary
Sub-200 ms latency with real-time streaming
Guaranteed uptime and throughput SLAs
Advanced watermarking & deepfake detection
On-prem and private cloud deployment

Talk to our team →

FAQ

Frequently asked questions

Everything you need to know before building with Chatterbox Multilingual.

Yes. The model is released under the MIT license, which means you can use it in commercial products, modify it, and redistribute it. PerTh watermarking remains embedded in generated audio so synthetic content stays detectable downstream.

Arabic, Danish, German, Greek, English, Spanish, Finnish, French, Hebrew, Hindi, Italian, Japanese, Korean, Malay, Dutch, Norwegian, Polish, Portuguese, Russian, Swedish, Swahili, Turkish, and Chinese. Zero-shot voice cloning is available across every one.

In blind user evaluations of the original Chatterbox release, listeners preferred Chatterbox over ElevenLabs 63.75% of the time. Multilingual extends that quality bar across 23 languages, with full model weights you can inspect, fine-tune, and host yourself.

Yes. Chatterbox Multilingual runs on CPU for offline generation and on a consumer or datacenter GPU for real-time inference. Pass device="cuda" or device="cpu" in from_pretrained.

PerTh is Resemble’s perceptual-threshold neural watermark. It embeds an inaudible signal into every generated clip that survives compression and re-encoding, so downstream tools can verify audio as AI-generated — an important guardrail for regulated or high-trust deployments.

Pro is for teams that need sub-200 ms streaming latency, custom fine-tuning on brand or domain vocabulary, guaranteed uptime, and enhanced watermarking and detection. Call centers, financial services, and healthcare platforms are typical fits.