PERTH WATERMARKER • RESEMBLE RESEARCH

An invisible watermark for every voice we generate.

PerTh is our neural speech watermarker — a deep network that embeds imperceptible, tamper-resistant data into every clip Resemble creates, so you can verify its origin long after it leaves your app.

~100%
Data recovery across standard attacks
0 dB
Audible difference to the listener
1
Parameter to embed any Resemble AI-generated audio
Trusted by
THE PROBLEM

As generative voices approach human quality, verification has to keep up.

Photo-editing tools taught us we can’t trust images on sight. Generative AI now does the same for speech — with almost none of the skill barrier. PerTh is our answer: every clip Resemble generates leaves with an invisible signature you can verify later.

STEP 1 • EMBED

Encode at generation

Every Resemble-generated clip passes through the PerTh model, which embeds a data payload into frequency regions chosen using psychoacoustics — inaudible to human ears by design.

STEP 2 • SURVIVE

Withstand real-world handling

The payload rides along through resampling, MP3 compression, time-stretching, time-shifting, and added noise. Audio gets edited in production — PerTh keeps up.

STEP 3 • RECOVERY

Verify on demand

Run suspect audio through our decoder. If it came from Resemble, we recover the embedded data — even from a short segment of non-silent speech.

WHAT PERTH DOES

Provenance baked into the waveform

A watermark is only useful if it survives the journey from generation to wherever your audio ends up. PerTh is designed for the messy reality of the modern content pipeline.

Imperceptible by design
Built on auditory masking — the way louder sounds hide quieter ones nearby in frequency and time. The watermark lives inside that masked region, so nothing is added that listeners can hear.
Tightly coupled to speech
Data is encoded into the frequencies most common in speech, not a detachable side channel. That coupling is what makes the signature difficult to strip without destroying the audio itself.
Resilient to attacks
Trained against a suite of adversarial transformations — resampling, re-encoding, speed changes, noise injection — so decoding stays reliable long after a clip has been edited, uploaded, and shared.
Decodes from short clips
Data is embedded across the waveform so a fragment — any non-silent segment — is enough to recover the payload. You don’t need the original file to verify it.
Transparent to end users
No workflow changes. No format changes. Customers keep shipping the same high-fidelity audio — the watermark is applied invisibly as part of generation.
Pair with Detect
Use PerTh alongside Resemble Detect to verify Resemble-generated content and flag synthetic audio from elsewhere — a layered defense for provenance and deepfake exposure.
HOW THE MODEL WORKS

PerTh: Perceptual meets Threshold

The model exploits the way humans perceive audio, then encodes data only into the regions we can’t hear — the ones that sit under a perceptual threshold.

Psychoacoustics as a channel
Human hearing has uneven sensitivity across frequencies, which means more information can be hidden in bands we’re less sensitive to. A second, richer effect — auditory masking — describes how a louder sound creates a “blanket” in amplitude–frequency–time space that hides quieter sounds nearby.
Encode inside the mask.
PerTh places watermark energy inside those masked regions. The result: a signal that carries data but stays perceptually invisible against the speech it rides on.
Trained against attacks.
During training we apply regularization — added noise, time-stretching, time-shifting — before the decoder sees the audio. That’s what produces near-100% recovery across standard transformations like resampling and re-encoding.
Frequency-aware placement.
The masking effect is not uniform; it depends on the frequency of the masker sound itself. PerTh accounts for this explicitly when choosing where to place payload energy.
Listen

Two clips. Same speaker. Spot the watermark.

Both files were generated by Resemble. One has the PerTh watermark applied; the other does not. The difference is imperceptible — exactly the point.

The recovery rate stays near 100% across a typical attack suite — resampling, re-encoding, noise injection, time-stretching. Attack types include:
Pitch shift up/down
Time stretch up/down
High pass
Low pass
Compressor
Add delay
Add noise
Resample up/down
No attack
WITH WATERMARK
NO WATERMARK
~100%
Data recovery under standard attack suite — resample, re-encode, stretch, noise.
0
Perceptible artifacts introduced — verified on the same-speaker A/B above.
100%
Of Resemble-generated audio can be watermarked at generation — no workflow change.
BUILT ON RESEMBLE AI

PerTh is one layer of a broader safety stack

Watermarking answers “did this come from us?” Detection answers “is this synthetic at all?” Used together, they close the loop from creation to distribution.
product
Resemble Watermarker
Ship the PerTh watermark as part of your production pipeline — watermark generated audio and verify it later via API.
EXPLORE RESEMBLE WATERMARKER
product
Deepfake Detection
Real-time, multimodal detection across audio, video, and image. The complement to watermarking for content you didn’t generate.
Explore resemble Detect
MODEL
DETECT-3B
Our flagship deepfake detection architecture — efficient and accurate across languages and generation methods.
EXPLORE THE MODEL
Frequently asked questions
Will the watermark change how my audio sounds?
No. PerTh places data inside frequency regions that fall below the human perceptual threshold — the regions already masked by louder nearby sounds. The audio Resemble generates remains at the same fidelity your customers expect.
Does PerTh survive MP3 compression, resampling, or speed changes?
Yes. The model is trained with regularization that explicitly simulates these transformations — added noise, time-stretching, time-shifting, re-encoding — so the decoder recovers the payload at nearly 100% across a standard attack suite.
How much audio do I need to verify a clip?
The payload is distributed across the waveform. Any non-silent segment is enough to recover it — you don’t need the full original file, and you don’t need access to the generation request.
Is PerTh applied automatically to Resemble-generated voices?
Yes — PerTh rolls out across Resemble-generated audio as a default safety layer. It does not change your integration, SDKs, or output formats. For customers who need explicit control, watermarking can also be configured per request.
How is PerTh different from deepfake detection?
PerTh answers “did this clip come from Resemble?” with a verifiable signature. Detection — see Resemble Detect and DETECT-3B — answers the broader “is this audio synthetic at all?” question across any source. Most teams use both.
Can an attacker remove the watermark?
Because the payload is tightly coupled to the speech content — encoded into the same frequency regions speech occupies — stripping it reliably without destroying the audio itself is difficult. No watermark is unconditionally unremovable, and this is an active area of research; PerTh is designed to raise that bar well above casual attacks.
How do I get started?
Talk to us about your use case — voice agent, content platform, media production, authentication — and we’ll scope the right setup. Book a demo or explore the AI Watermarker product page.
Get complete generative AI security
Join thousands of developers and enterprises securing with Resemble AI