Deepfake detection model

DETECT-2B deepfake detection, rebuilt from the ground up

A groundbreaking approach to deepfake detection that combines efficient architecture with unparalleled accuracy across diverse languages and generation methods.

>94%
Accuracy
200ms
To prediction
30+
Languages supported
Trusted by

A major leap forward in model architecture, training data, and overall performance.

As generative AI evolves, so does the sophistication of synthetic audio. DETECT-2B builds on the foundation of our original Detect model with an ensemble architecture, self-supervised representation learning, and advanced sequence modeling — robust enough to spot deepfakes in the wild.

01

Ensemble of sub-models

Multiple complementary sub-models are fused into a single prediction. Each captures a different signal — from low-level acoustic artifacts to high-level sequential patterns indicative of synthesis.

02

Self-supervised representations

Pre-trained audio representation models like Wav2Vec2 give DETECT-2B a rich foundation of language-agnostic features, learned from massive amounts of unlabeled audio.

03

Efficient fine-tuning

Adaptation modules inserted into key layers of a frozen backbone learn to shift attention toward subtle deepfake artifacts — without retraining from scratch.

04

Mamba-SSM sequence modeling

State Space Models bring probabilistic temporal dynamics to the classifier, adapting to observed audio features and surfacing inconsistencies traditional classifiers miss.

Frame-by-frame analysis, fused into one verdict.

Each sub-model predicts a fakeness score for short time slices across the duration of an input audio clip. Scores are aggregated and compared to a carefully tuned threshold to produce a final real-vs-fake classification.

DETECT-2B frame-by-frame analysis of an authentic audio clip

Granular, frame-level predictions

DETECT-2B doesn't just return a single pass/fail. Output is a granular, frame-by-frame analysis of the audio stream, with predictions made for each frame to determine whether it is a spoof.

The raw fakeness scores can be returned directly, or the API can aggregate them and apply the classification threshold to produce a single overall prediction — tunable to your tolerance for false positives and false negatives.

DETECT-2B detecting a deepfake across the duration of a clip

Parameter efficient by design

By leveraging pre-trained components and efficient fine-tuning techniques, DETECT-2B achieves state-of-the-art performance while staying relatively fast to train and lightweight to deploy.

That means sub-second inference — fast enough to drop into real-time audio pipelines, contact centers, and content moderation systems.

Tested against unseen speakers, methods, and languages.

Our evaluation set is intentionally adversarial: unseen speakers, unseen deepfake generation methods, and languages the model never trained on — sourced from academic datasets and diverse real-world audio.

Low equal error rate

DETECT-2B achieves an impressive EER — correctly identifying the vast majority of deepfakes while maintaining a very low false positive rate. A substantial improvement over the original Detect model.

Consistent across languages

Consistently high accuracy across a wide variety of languages, including those not seen during training. The model is learning language-agnostic cues of audio manipulation.

Robust to new generation methods

Strong performance on the latest synthetic audio approaches — even methods not represented in training data. It's learning the fundamentals of synthesis, not memorizing patterns.

DETECT-2B accuracy broken down by language
Accuracy across languages (seen and unseen during training)
DETECT-2B detection accuracy across different deepfake generation methods
Performance across deepfake generation methods

Where teams deploy DETECT-2B.

Whenever voice carries trust — in the contact center, in media, in communications — DETECT-2B helps verify that what you're hearing is real.

Contact centers

Stop voice-cloned social engineering

Screen inbound calls for synthetic voices before they reach an agent — blocking cloning-based fraud in real time.

Media & journalism

Verify audio before it's published

Add a deepfake check to the editorial workflow. Upload a clip, get a granular fakeness score, make a confident call.

Platform trust & safety

Moderate synthetic audio at scale

Batch-analyze user-uploaded audio through the API to surface manipulated content for review without blocking legitimate uploads.

Enterprise security

Authenticate voice-based approvals

Layer DETECT-2B into executive communications, wire-transfer approvals, and sensitive voice workflows as an extra line of defense.

A simple, flexible API — or a dashboard, your call.

Two ways to integrate DETECT-2B: a lightweight REST API for pipelines at scale, or a web-based dashboard for teams who want a visual interface.

REST API

Submit audio clips individually or in batches. Receive raw frame-level fakeness scores or a single aggregated prediction. Classification thresholds are adjustable to balance false positives and false negatives for your use case.

# Analyze an audio clip
POST https://app.resemble.ai/api/v2/detect
Authorization: Bearer <token>
Content-Type: multipart/form-data

file=@clip.wav
Request API access

Web dashboard

For customers who prefer visual interaction: upload audio files, view frame-by-frame analysis, and adjust detection settings — no API work required. Ideal for trust & safety teams and editorial reviewers.

# What you get in the dashboard
- Drag-and-drop audio upload
- Frame-by-frame fakeness timeline
- Per-language breakdowns
- Adjustable classification threshold
- Team sharing & audit history
Book a dashboard demo

More about DETECT-2B.

What is DETECT-2B and how does it differ from previous deepfake detection models?
DETECT-2B is the latest generation of Resemble AI's deepfake detection solution. It represents a significant advancement over previous models, featuring:
  • An ensemble of multiple sub-models
  • Pre-trained self-supervised audio representation models
  • Efficient fine-tuning techniques
  • Advanced sequence modeling, including Mamba-SSM (State Space Models)
  • Greater parameter efficiency
  • Improved accuracy and performance across various languages and deepfake generation methods
How does DETECT-2B work to identify deepfake audio?
  • An ensemble of sub-models analyzes different aspects of the audio.
  • The model processes short time slices across the duration of an input audio clip.
  • It predicts a fakeness score for each slice.
  • These scores are aggregated and compared to a tuned threshold.
  • Based on this comparison, it makes a final real-vs-fake classification for the full clip.
  • Pre-trained components and efficient fine-tuning keep it fast and lightweight.
What is Mamba-SSM and why is it important for deepfake detection?
Mamba-SSM (State Space Models) is an emerging architecture in DETECT-2B that enhances sequence modeling:
  • It uses stochastic processes to model state transitions within audio sequences.
  • This approach captures temporal dynamics in audio signals better.
  • It enables adaptive state transitions based on observed audio features.
  • The probabilistic framework is robust to variations and noise.
  • It detects subtle artifacts traditional classifiers miss.
  • It integrates cleanly with self-supervised learning models like Wav2Vec2.
How effective is DETECT-2B across different languages and accents?
DETECT-2B performs consistently well on a diverse range of languages, including those not seen during training. This cross-lingual performance is primarily driven by extensive multi-lingual training data and pre-trained models like Wav2Vec2 — the model learns language-agnostic features indicative of audio manipulation.
What kind of data was used to train and evaluate DETECT-2B?
  • Training data includes a large amount of real and fake audio generated using various methods.
  • It covers a wide range of speakers across multiple languages.
  • Strict separation between speakers in the training and evaluation sets prevents overfitting.
  • The evaluation dataset is very large and includes unseen speakers, deepfake generation methods, and languages.
  • It incorporates academic datasets and diverse real-world sources.
How can customers integrate DETECT-2B into their own systems?
Two main options:
  • API integration — flexible API for individual or batch submissions, raw or aggregated predictions, adjustable thresholds.
  • Web dashboard — upload audio, view results, and tune settings visually without writing code.
What are the future plans for improving DETECT-2B?
Planned research directions include advanced representation learning, new model architectures, training-data expansion, adaptation to emerging generation methods, better real-time efficiency, improved cross-lingual performance, and robustness against adversarial attacks.
Get complete generative AI security
Join thousands of developers and enterprises securing with Resemble AI