Resemble Voice Creation

Clone any voice or
design one that doesn't exist yet.

Two paths to a production-ready voice, powered by the open-source model that outperformed ElevenLabs in blind evaluation.

Trusted by
Model

Powered by Chatterbox

Chatterbox is MIT-licensed and open-source. Clone any voice from 10 seconds of audio, or generate one from a text description. PerTh watermarking available on every output.
~63.75%
Of evaluators preferred Chatterbox over ElevenLabs in an independent blind study.
Benchmarks available upon request.
Resemble AI Voice Creation paths

Choose based on whether you're working from existing audio or building from scratch.

Rapid Clone — 10 seconds of audio
READY IN <1 MIN
READY IN <1 MIN
Pre-built professional voices
DEPLOY INSTANTLY
DEPLOY INSTANTLY
Voice Design from a text description
3 OPTIONS RETURNED
3 OPTIONS RETURNED
On-prem self-hosting via Docker/Kubernetes
MIT LICENSED
MIT LICENSED
Professional Clone — 10–25+ min of audio
TRAINED IN ~40 MIN
TRAINED IN ~40 MIN
PerTh watermark on every output
AVAILABLE
AVAILABLE
OUR CAPABILITIES

Everything a production voice needs, built in.

Designed to keep your voice consistent across languages, use cases, and API calls.
Custom pronounciation
Generates multiple pronunciation variants for any term. The approved pronunciation applies across every voice, session, and API call automatically.
Multilingual cloning
Clones a voice once and generates speech across 23 languages, with accent and vocal character retained without separate training runs.
Per-use-case variants
Prompt one clone for unlimited variants: conversational, commercial, phone agent, and more, each in natural language.
<1 min
Time to a functional Rapid Clone from 10 seconds of audio
23
Languages supported with zero-shot voice cloning
10M+
Hugging Face downloads, 24,000+ GitHub stars
RESEMBLE VOICE CREATION USE CASES

Built for teams where the voice has to be consistent, accurate, and yours.

Every use case below starts with a problem a generic voice couldn't solve.

Voice agents
The problem:
Generic voices don't fit the brand. Cloning takes too long to iterate on.
Resemble AI solution:
Rapid Clone in under 1 minute with per-use-case variants available instantly.
Games
The problem:
Casting and recording 40 characters is slow, expensive, and hard to iterate on.
Resemble AI solution:
Voice Design generates distinct characters from a text description. No casting, no studio time.
Audiobooks & long-form
The problem:
Long-form narration demands a consistent voice with full emotional range across hours of content.
Resemble AI solution:
Professional Clone trains on 10 to 25+ minutes of audio and delivers consistent, expressive output at any length.
Media & Podcasts
The problem:
Localizing content means re-recording the host in every language.
Resemble AI solution:
Clone the host voice once and generate in 23 languages with accent and emotion retained.
Healthcare & HealthTech
The problem:
Drug names, legal terms, and brand vocabulary mispronounce on every generic engine.
Resemble AI solution:
Lock the correct pronunciation once. Applies across every voice and API call.
Regulated industries
The problem:
Sensitive voice data cannot leave the organization's infrastructure.
Resemble AI solution:
MIT-licensed. Self-host via pip install. On-premise via Docker/Kubernetes. No forced cloud dependency.
SOC2 Type II
Enterprise plans
EU AI Act ready
Mandatory Aug 2026
GDPR Compatible
On-prem available
HIPAA Compatible
Air-gapped deployment
SSO / SAML
Enterprise identity
C2PA Standard
Content provenance
Trusted in production

Trusted where it matters most

When the stakes are real, rely on Resemble AI
AI Voice Reconstruction
AI narration for The Andy Warhol Diaries, generated from three minutes of source audio.
Personalized TTS
354,000 personalized audio messages generated. 7x revenue impact on fan engagement.
Multilingual Audio
Production-grade voice generation at scale for educational content across multiple languages.
Consumer Voice Cloning
Parents record 25 sentences. Resemble clones the voice. Bedtime stories narrated in the parent's own voice. 4.8 App Store rating.
INTEGRATIONS AND DEPLOYMENTS

Go live in hours, not sprints.

Rest API
Python SDK
Node.js SDK
WebSocket streaming
ZIP or WAV dataset upload for cloning
On-prem available
Webhook callback on training completion
Frequently asked questions
What is the difference between Rapid Clone and Professional Clone?
Rapid Clone needs 10 seconds, delivers in under 1 minute. Professional Clone needs 10 to 25+ minutes, trains in ~40 minutes, and produces a voice with full emotional range nearly indistinguishable from the source.
How does Voice Design work?
Describe the voice you want in natural language: age, accent, tone, style. The API returns 3 distinct candidates. Choose one and generate speech immediately, even while the voice finishes building in the background.
Can a cloned voice speak multiple languages?
Yes. Chatterbox Multilingual supports zero-shot cloning across 23 languages. Clone once in any language and generate speech across all 23 without separate training runs. Accent and vocal character are retained.
DIs consent required for voice cloning?
Yes for Professional Clone. Explicit verifiable consent from the voice talent is required before training data is uploaded. Consent workflows are built into the platform.
Can I create multiple variants of the same cloned voice?
Yes. Once a voice is cloned, prompt it in natural language for different use cases: conversational, commercial, phone agent, and more. Each variant is saved and reusable via API.
What are the deployment options?
Cloud API, open-source self-hosting via pip install, and on-premise via Docker/Kubernetes. Business plan or higher required for the Voice Cloning API. SLA documentation available on request.
How quickly can we integrate?
Most teams are live within a day using the REST API or SDK. Open-source deployment via Hugging Face requires no API key.
Get complete generative AI security
Join thousands of developers and enterprises securing with Resemble AI