PerTh Watermarker model

THE PROBLEM

As generative voices approach human quality, verification has to keep up.

Photo-editing tools taught us we can’t trust images on sight. Generative AI now does the same for speech — with almost none of the skill barrier. PerTh is our answer: every clip Resemble generates leaves with an invisible signature you can verify later.



STEP 1 • EMBED

Encode at generation

Every Resemble-generated clip passes through the PerTh model, which embeds a data payload into frequency regions chosen using psychoacoustics — inaudible to human ears by design.



STEP 2 • SURVIVE

Withstand real-world handling

The payload rides along through resampling, MP3 compression, time-stretching, time-shifting, and added noise. Audio gets edited in production — PerTh keeps up.



STEP 3 • RECOVERY

Verify on demand

Run suspect audio through our decoder. If it came from Resemble, we recover the embedded data — even from a short segment of non-silent speech.

WHAT PERTH DOES

Provenance baked into the waveform

A watermark is only useful if it survives the journey from generation to wherever your audio ends up. PerTh is designed for the messy reality of the modern content pipeline.

Imperceptible by design

Built on auditory masking — the way louder sounds hide quieter ones nearby in frequency and time. The watermark lives inside that masked region, so nothing is added that listeners can hear.

Tightly coupled to speech

Data is encoded into the frequencies most common in speech, not a detachable side channel. That coupling is what makes the signature difficult to strip without destroying the audio itself.

Resilient to attacks

Trained against a suite of adversarial transformations — resampling, re-encoding, speed changes, noise injection — so decoding stays reliable long after a clip has been edited, uploaded, and shared.

Decodes from short clips

Data is embedded across the waveform so a fragment — any non-silent segment — is enough to recover the payload. You don’t need the original file to verify it.

Transparent to end users

No workflow changes. No format changes. Customers keep shipping the same high-fidelity audio — the watermark is applied invisibly as part of generation.

Pair with Detect

Use PerTh alongside Resemble Detect to verify Resemble-generated content and flag synthetic audio from elsewhere — a layered defense for provenance and deepfake exposure.

HOW THE MODEL WORKS

PerTh: Perceptual meets Threshold

The model exploits the way humans perceive audio, then encodes data only into the regions we can’t hear — the ones that sit under a perceptual threshold.

Psychoacoustics as a channel

Human hearing has uneven sensitivity across frequencies, which means more information can be hidden in bands we’re less sensitive to. A second, richer effect — auditory masking — describes how a louder sound creates a “blanket” in amplitude–frequency–time space that hides quieter sounds nearby.

Encode inside the mask.

PerTh places watermark energy inside those masked regions. The result: a signal that carries data but stays perceptually invisible against the speech it rides on.

Trained against attacks.

During training we apply regularization — added noise, time-stretching, time-shifting — before the decoder sees the audio. That’s what produces near-100% recovery across standard transformations like resampling and re-encoding.

Frequency-aware placement.

The masking effect is not uniform; it depends on the frequency of the masker sound itself. PerTh accounts for this explicitly when choosing where to place payload energy.

Listen

Two clips. Same speaker. Spot the watermark.

Both files were generated by Resemble. One has the PerTh watermark applied; the other does not. The difference is imperceptible — exactly the point.

The recovery rate stays near 100% across a typical attack suite — resampling, re-encoding, noise injection, time-stretching. Attack types include:

Pitch shift up/down

Time stretch up/down

High pass

Low pass

Compressor

Add delay

Add noise

Resample up/down

No attack

WITH WATERMARK

NO WATERMARK

~100%

Data recovery under standard attack suite — resample, re-encode, stretch, noise.

Perceptible artifacts introduced — verified on the same-speaker A/B above.

100%

Of Resemble-generated audio can be watermarked at generation — no workflow change.

Frequently asked questions

Will the watermark change how my audio sounds?

No. PerTh places data inside frequency regions that fall below the human perceptual threshold — the regions already masked by louder nearby sounds. The audio Resemble generates remains at the same fidelity your customers expect.

Does PerTh survive MP3 compression, resampling, or speed changes?

Yes. The model is trained with regularization that explicitly simulates these transformations — added noise, time-stretching, time-shifting, re-encoding — so the decoder recovers the payload at nearly 100% across a standard attack suite.

How much audio do I need to verify a clip?

The payload is distributed across the waveform. Any non-silent segment is enough to recover it — you don’t need the full original file, and you don’t need access to the generation request.

Is PerTh applied automatically to Resemble-generated voices?

Yes — PerTh rolls out across Resemble-generated audio as a default safety layer. It does not change your integration, SDKs, or output formats. For customers who need explicit control, watermarking can also be configured per request.

How is PerTh different from deepfake detection?

PerTh answers “did this clip come from Resemble?” with a verifiable signature. Detection — see Resemble Detect and DETECT-3B — answers the broader “is this audio synthetic at all?” question across any source. Most teams use both.

Can an attacker remove the watermark?

Because the payload is tightly coupled to the speech content — encoded into the same frequency regions speech occupies — stripping it reliably without destroying the audio itself is difficult. No watermark is unconditionally unremovable, and this is an active area of research; PerTh is designed to raise that bar well above casual attacks.

How do I get started?

Talk to us about your use case — voice agent, content platform, media production, authentication — and we’ll scope the right setup. Book a demo or explore the AI Watermarker product page.

An invisible watermark for every voice we generate.

As generative voices approach human quality, verification has to keep up.

Provenance baked into the waveform

PerTh: Perceptual meets Threshold

Two clips. Same speaker. Spot the watermark.

PerTh is one layer of a broader safety stack