Models

Secure with open foundational models

Chatterbox is open-source and MIT licensed because we believe in accessible voice AI. DETECT-3B exists because we understand better than anyone what those models can do in the wrong hands.

RESEMBLE AI MODELS

Our model portfolio

Production-grade models for voice generation, watermarking, and deepfake detection. Available via API or on-premises deployment.

Generate — Voice AI
Chatterbox
MIT OPEN SOURCE
Production-grade TTS with zero-shot voice cloning. Outperforms ElevenLabs in blind evaluations. First open-source model with emotion exaggeration control.
Sub-200ms
Voice cloning
Emotion control
PerTh watermarked
Chatterbox Turbo
NEW
350M parameter architecture optimized for voice agents. 1-step decoder, native paralinguistic tags — [cough], [laugh], [chuckle]. Built for latency-critical production.
Real-time
350M params
Paralinguistic
ONNX available
Multilingual
23-language TTS with voice cloning across Arabic, Chinese, German, French, Hindi, Japanese, Korean, Spanish, and 15 more. MIT licensed.
23 languages
Voice cloning
MIT licensed
Chatterbox Pro
ENTERPRISE
Enterprise tier with custom fine-tuning on brand vocabulary, SLAs, guaranteed uptime, sub-200ms streaming, and advanced watermarking and detection.
Custom fine-tuning
SLA
On-prem
23 languanges
Verify — Watermarking
PerTh Watermarker
Perceptual Threshold — a deep neural network that embeds imperceptible, psychoacoustically-masked watermarks into audio at generation time. Survives MP3 compression, audio editing, noise, and codec transforms. ~95% detection accuracy. Embedded in every Chatterbox output by default.
~100% detection accuracy
Survives compression
Audio
On-prem
Resemblyzer
OPEN SOURCE
Deep learning voice encoder that derives a high-dimensional speaker representation from a few seconds of audio. Used for voice authentication, speaker diarization, and similarity scoring.
Speaker embedding
Voice similarity
Diarization
Detect — Deepfake detection
DETECT-3B Omni
NEW
3 billion parameter multimodal detection model. The only deepfake detector covering audio, image, and video in a single unified architecture. Ranked #1 on the Speech DeepFake Arena benchmark. Zero-day coverage for new generative models in under an hour.
#1 Speech Deepfake Arena
<300ms detection
3B parameters
160+ models
40+ languages
On-prem
Resemble Intelligence
EXPLAINABILITY LAYER
Know why, not just what. Intelligence pairs DETECT-3B Omni's output with Gemini 3 Flash to generate human-readable forensic explanations in real time — surfacing which artifacts triggered a flag and why. 4× faster than Gemini 2.5 Pro. Built for compliance teams, legal review, and trust & safety workflows that need evidence, not just a score.
Human-readable reports
Real-time
Audit-ready
Powered by Gemini 3 Flash
DEEPFAKE DETECTION COVERAGE

Models we detect

DETECT-3B Omni is battle-tested against 160+ generative AI models. Coverage spans every major audio, image, and video generator — with zero-day updates when new models launch.

Audio
Image
Video
ElevenLabs
Text-to-speech
Covered
OpenAI TTS
TTS-1/TTS-1 HD
Covered
Azure TTS
Microsoft Neural TTS
Covered
Google TTS
WaveNet / Neural2
Covered
AWS Polly
Neural / Standard
Covered
Suno
AI music generation
Covered
Udio
AI music generation
Covered
PlayHT
Text-to-speech
Covered
Murf
Text-to-speech
Covered
Descript
Overdub / TTS
Covered
DALL-E 3
Open AI
98% accuracy
Midjourney
v6 / v7
98% accuracy
Stable Diffusion
SDXL / SD 3.5
94% accuracy
Nano Banana
Image generation
Covered
Flux
Black Forest Labs v2
99% accuracy
Gemini
Imagen 3 / 2.0 Flash
99% accuracy
GPT-4o
OpenAI image gen
99% accuracy
StyleGAN
v2 / v3
>99% accuracy
Ideogram
v2 / v3
Covered
Leonardo AI
Image generation
Covered
Sora
OpenAI
Covered
Veo
Google DeepMind
>99% accuracy
Runway
Gen-3 Alpha
Covered
HeyGen
Avatar video
Covered
Pika
Video generation
Covered
Kling
Kuaishou
Covered
Seedance
ByteDance
covered
Synthesia
AI avatar video
Covered
D-ID
Talking avatar
Covered
Stable Diffusion
Overdub / TTS
Covered
Plus...
Bark
VALL-E
VALL-E2
YourTTS
Tortoise TTS
XTTS
StyleTTS 2
MetaVoice
Kokoro
VoiceBox
NaturalSpeech 3
HierSpeech++
...and over 100 more
Stable Diffusion XL
Latent Diffusion
eDiff-I
GLIDE
Kandinsky
DeepFloyd IF
Playground v3
Grok Aurora
Recraft v3
Lumina Image
Emu (Meta)
...and more
CogVideoX
Wan 2.1
Open-Sora
ModelScope
VideoCrafter
AnimateDiff
VideoPoet
Make-A-Video
Gen-2
Morph Studio
Mochi 1
HunyuanVideo
AVAILABLE ON-PREM

Run every model inside your perimeter — or in the cloud

Every Resemble AI model — voice generation, watermarking, and deepfake detection is available for fully air-gapped on-premises deployment. No telemetry. No external API calls. Your data stays where it should.
Model access
Chatterbox and DETECT-3B Omni with full model weights on your GPUs.
Real-time analysis
Real-time audio, image, and video analysis behind your firewall.
Zero dependencies
Kubernetes and local Python packages — no cloud dependencies.
Compliance ready
Meets EU AI Act, HIPAA, SOC 2, and financial services data residency requirements.
Get complete generative AI security
Join thousands of developers and enterprises securing with Resemble AI