Chatterbox Multilingual · Open Source TTS

Hollywood-quality voice synthesis, now in 23 languages.

Chatterbox Multilingual is our production-grade, open-source TTS model with expressive emotion control, PerTh watermarking, and zero-shot voice cloning — free to download, free to build on.

23
Languages supported at launch
MIT
Permissive open-source license
Zero-shot
Voice cloning from a short sample
Trusted by the open-source community
10M+
Hugging Face downloads
24K+
GitHub stars
63.75%
Preferred over ElevenLabs in blind tests
23
Languages from day one

State-of-the-art voice cloning in 23 languages

Users expect apps and agents to sound human, speak in their native language, and deliver content with authentic tone and emotion. Chatterbox Multilingual was built to meet that demand.

🌐

Breadth of languages

23 supported languages from launch — Arabic, Chinese, English, French, German, Hindi, Japanese, Korean, Spanish and more — with zero-shot voice cloning across every one.

🎤

Expressive control

Fine-tune delivery with emotion and intensity settings. Dial in warmth, urgency, or calm to match the moment — not just the text.

🛡

Enterprise reliability

Ultra-stable inference with PerTh watermarking baked into every output, so every voice you generate is traceable and authenticated at creation.

All 23 languages, day one.

Whether you’re designing a voice agent for customer support, a language-learning app, or a global gaming experience, Chatterbox Multilingual ships with native support for the languages your users actually speak.

🇦🇷
Arabicar
🇩🇰
Danishda
🇩🇪
Germande
🇬🇷
Greekel
🇬🇧
Englishen
🇪🇸
Spanishes
🇫🇮
Finnishfi
🇫🇷
Frenchfr
🇮🇱
Hebrewhe
🇮🇳
Hindihi
🇮🇹
Italianit
🇯🇵
Japaneseja
🇰🇷
Koreanko
🇲🇾
Malayms
🇳🇱
Dutchnl
🇳🇴
Norwegianno
🇵🇱
Polishpl
🇵🇹
Portuguesept
🇷🇺
Russianru
🇸🇪
Swedishsv
🇰🇪
Swahilisw
🇹🇷
Turkishtr
🇨🇳
Chinesezh

Hear it in action

A handful of multilingual samples generated directly from Chatterbox Multilingual — same model, same voice prompt, four different languages.

🇮🇳 Hindi
🇩🇪 German
🇪🇸 Spanish
🇸🇦 Arabic

Generate your own samples in the Resemble app or run the model locally from Hugging Face.

Six lines of code. Twenty-three languages.

Pull the weights straight from Hugging Face and generate production-quality speech in any supported language. No license key, no rate limits, no platform lock-in.

Open source. MIT licensed. Production-ready.

Chatterbox Multilingual ships as a standard Python package with first-class PyTorch and torchaudio support. Run it on your own GPU, deploy it on your own infra, or layer it into an existing pipeline — the only constraint is your imagination.

Built-in PerTh watermarking means every clip you generate is invisibly tagged at creation, so downstream detection stays possible even after re-encoding.

chatterbox_fr.py
import torchaudio as ta
from chatterbox.tts import ChatterboxTTS

model = ChatterboxTTS.from_pretrained(
    repo_id="ResembleAI/chatterbox-multilingual",
    device="cuda"  # or "cpu"
)

text = "Bienvenue dans Chatterbox Multilingual."
wav = model.generate(text, lang="fr")
ta.save("sample_fr.wav", wav, model.sr)

Start free. Scale when you’re ready.

The open-source release brings world-class TTS to everyone. For regulated industries and production workloads, Chatterbox Multilingual Pro closes the last mile with fine-tuning, SLAs, and low-latency infrastructure.

Open source

Chatterbox Multilingual

MIT licensed · free forever
  • 23 languages with zero-shot voice cloning
  • Emotion and intensity controls
  • PerTh watermarking at creation
  • Run locally on CPU or GPU
  • Full model weights on Hugging Face
Get on GitHub →
Pro

Chatterbox Multilingual Pro

For enterprises shipping at scale
  • Custom fine-tuning on brand vocabulary
  • Sub-200 ms latency with real-time streaming
  • Guaranteed uptime and throughput SLAs
  • Advanced watermarking & deepfake detection
  • On-prem and private cloud deployment
Talk to our team →

Frequently asked questions

Everything you need to know before building with Chatterbox Multilingual.

Yes. The model is released under the MIT license, which means you can use it in commercial products, modify it, and redistribute it. PerTh watermarking remains embedded in generated audio so synthetic content stays detectable downstream.
Arabic, Danish, German, Greek, English, Spanish, Finnish, French, Hebrew, Hindi, Italian, Japanese, Korean, Malay, Dutch, Norwegian, Polish, Portuguese, Russian, Swedish, Swahili, Turkish, and Chinese. Zero-shot voice cloning is available across every one.
In blind user evaluations of the original Chatterbox release, listeners preferred Chatterbox over ElevenLabs 63.75% of the time. Multilingual extends that quality bar across 23 languages, with full model weights you can inspect, fine-tune, and host yourself.
Yes. Chatterbox Multilingual runs on CPU for offline generation and on a consumer or datacenter GPU for real-time inference. Pass device="cuda" or device="cpu" in from_pretrained.
PerTh is Resemble’s perceptual-threshold neural watermark. It embeds an inaudible signal into every generated clip that survives compression and re-encoding, so downstream tools can verify audio as AI-generated — an important guardrail for regulated or high-trust deployments.
Pro is for teams that need sub-200 ms streaming latency, custom fine-tuning on brand or domain vocabulary, guaranteed uptime, and enhanced watermarking and detection. Call centers, financial services, and healthcare platforms are typical fits.
Get complete generative AI security
Join thousands of developers and enterprises securing with Resemble AI