PERTH MULTIMODAL WATERMARKER • RESEMBLE RESEARCH

A multimodal watermark for every piece of content you generate

PerTh Multimodal extends our neural watermarking architecture beyond audio to cover video, image, and text with one API call. It encodes explicit marks and reads C2PA and SynthID.

4
Modalities covered: audio, image, video, and text
~99.9%
Data recovery across attack types
16-256
Explicit bits to encode custom messages
Trusted by
THE PROBLEM

When any modality can be faked, every modality needs a mark.

The tools to create convincing synthetic content now require almost no technical skill. EU AI Act Article 50 requires machine-readable marking of AI-generated content by August 2026. PerTh Multimodal embeds an invisible, persistent watermark at creation across audio, video, image, and text so the proof of origin travels with the content.

STEP 1 • ENCODE

Ownership embedded at creation

PerTh Multimodal embeds a data payload into perceptually masked regions of the signal. Inaudible in audio, imperceptible in image and video, semantically neutral in text. Add a custom identifier and it travels with the content.

STEP 2 • SURVIVE

Withstand real-world handling

The payload persists through format conversion, re-encoding, compression, and editing. PerTh Multimodal is trained against the transformations content encounters in the real world.

STEP 3 • DECODE

Verify on demand

Run any file through the decoder and PerTh Multimodal returns your custom identifier, C2PA signatures, and SynthID in a single provenance report.

WHAT PERTH Multimodal DOES

PerTh Multimodal extends the architecture to every modality

The same perceptual masking principles that made PerTh work for audio, now applies to video, image, and text with explicit payloads, so every file carries a verifiable identifier.

Multimodal by design
Multimodal extends PerTh's neural watermarking architecture to audio, video, image, and text. Each modality uses algorithms tuned to its signal characteristics: psychoacoustics for audio, pixel-domain masking for image and video, semantic rewriting for text.
Explicit payload with custom identifier
Encode your organization name, system ID, or any string directly into the mark. Audio permits a 16-bit upload, image and video up to 256-bit encodes. Anyone who decodes it gets that identifier back, satisfying the EU AI Act's system identifier requirement.
Reads third-party marks
Decodes PerTh watermarks, C2PA signatures, and SynthID in a single pass. Designed to extend to new marks as other providers release them.
Resilient to real-world attacks · Audio
Near-100% data recovery across resampling, re-encoding, noise injection, compression, and pitch shifting.
Improved accuracy over baseline · Image and video
Fine-tuned beyond the Meta open-source baseline. Improved recovery under compression, cropping, brightness changes, and blur.
Pair with Detect
Use PerTh Multimodal alongside Resemble Detect to verify Resemble-generated content and flag synthetic content from any source.
HOW THE MODEL WORKS

Perceptual masking, applied per signal

PerTh Multimodal's architecture applies the core insight of the original PerTh model — encode data only into the regions humans can't perceive, to four modalities, each with signal-appropriate algorithms.

Audio psychoacoustics masking
Auditory masking creates a perceptual blanket in amplitude-frequency-time space where quieter sounds are hidden by louder ones nearby. The watermark lives inside that region. Trained with regularization against resampling, re-encoding, noise injection, and time-stretching, producing near-100% recovery across a standard attack suite.
Image and video pixel-domain modification
Accuracy exceeding the Meta open source baseline. Imperceptible pixel-domain modifications that survive compression, cropping, brightness adjustment, color jitter, and blur.
Text semantic rewriting
A semantic rewriting model makes meaning-preserving changes to word choice and phrasing. The mark is embedded in the pattern of the rewrite. The decoder looks for its own pattern to determine authorship.
VERIFY

PerTh watermarker: survives compression, re-encoding, and attack

Detection accuracy across 18 real-world attack conditions. PerTh V2 ships with improved robustness across reverb, pitch shift, and spectral manipulation.
~100%
Detection on clean and compressed audio
PerTh V2 · No-attack + standard codecs
STRONG (>90%)
MODERATE (70-90%)
WEAK (<70%)
PerTh — Open Source, Audio Only
wav_dither_attack
100%
random_wav_wavelet_noise
90%
random_wav_reverb_attack
95%
random_wav_resample_attack
100%
random_wav_precision_attack
100%
random_wav_pitch_shift_attack
10%
random_wav_mulaw_attack
100%
random_wav_high_pass_attack
100%
random_wav_gaussian_noise_clipped
45%
random_spec_time_mask
100%
random_spec_stretch
100%
random_spec_scale
100%
random_spec_lowclip
100%
random_spec_highclip
100%
random_spec_gaussian_noise_clipped
100%
random_spec_contiguous_band_mask
72%
no_watermark
100%
no_attack
100%
0%
Accuracy
100%
PerTh Multimodal
wav_dither_attack
100%
random_wav_wavelet_noise
100%
random_wav_reverb_attack
88%
random_wav_resample_attack
100%
random_wav_precision_attack
100%
random_wav_pitch_shift_attack
94%
random_wav_mulaw_attack
100%
random_wav_high_pass_attack
100%
random_wav_gaussian_noise_clipped
100%
random_spec_time_mask
100%
random_spec_stretch
98%
random_spec_scale
100%
random_spec_lowclip
100%
random_spec_highclip
100%
random_spec_gaussian_noise_clipped
100%
random_spec_contiguous_band_mask
98%
no_watermark
100%
no_attack
100%
0%
Accuracy
100%
Watermark robustness = detection accuracy after each attack transform (1.0 = 100%). "no_attack" = clean audio. "no_watermark" = false positive rate on unwatermarked audio.
BUILT ON RESEMBLE AI

PerTh Multimodal is one layer of a broader safety stack

Watermarking answers “did this come from us?” Detection answers “is this synthetic at all?” Used together, they close the loop from creation to distribution.
product
Resemble Watermarker
Ship the watermark as part of your production pipeline. Watermark generated content and verify it later via API.
EXPLORE RESEMBLE WATERMARKER
product
Deepfake Detection
Real-time, multimodal detection across audio, video, and image. The complement to watermarking for content you didn’t generate.
Explore resemble Detect
MODEL
DETECT-3B
Our flagship deepfake detection architecture, efficient and accurate across languages and generation methods.
EXPLORE THE MODEL
Frequently asked questions
How does PerTh Multimodal embed the watermark without affecting signal quality?
PerTh Multimodal places the data payload inside perceptually masked regions — the frequencies already masked by louder sounds in audio, imperceptible pixel modifications in image and video, and meaning-preserving rewrites in text. Nothing is added that falls outside the perceptual threshold.
What attack types does PerTh Multimodal survive?
For audio, near-100% data recovery across resampling, re-encoding, MP3 compression, pitch shifting, time-stretching, noise injection, high and low pass filtering, and added delay. For image and video, the mark survives compression, cropping, brightness changes, color jitter, and blur.
What is the 16-bit explicit payload and what can it encode?
The explicit payload lets you embed a custom identifier: organization name, system ID, or any string within the bit limit directly into the mark. Decode returns that identifier, enabling verifiable origin attribution rather than binary watermark detection. This satisfies the EU AI Act Article 50 requirement for a system identifier in the mark.
Does the decoder require the original file?
No. For audio, the payload is distributed across the waveform so any non-silent segment is sufficient for recovery. You don't need access to the original generation request or file.
Which third-party marks does the decoder read?
PerTh Multimodal reads PerTh watermarks, C2PA signatures, and SynthID in a single pass. The architecture is designed to extend to additional marks as other providers expose their formats.
How does PerTh Multimodal map to EU AI Act Article 50?
Article 50 requires machine-readable marking across modalities, dual independent provenance layers, and a system identifier in the mark. PerTh Multimodal covers all four modalities, embeds both a neural watermark and C2PA signature simultaneously, and supports a custom identifier via the explicit payload.
How do I get started?
Talk to us about your use case. Voice agent, content platform, media production, or authentication, we’ll scope the right setup.
Get complete generative AI security
Join thousands of developers and enterprises securing with Resemble AI