Deepfake fraud at work: video call scams, face swaps, and vishing

‍A recording of the Resemble AI webinar on how synthetic media is being used to defraud finance and hiring teams, and the controls that hold up in the real world.

Deepfake fraud has moved out of the research lab and into ordinary business workflows. The tools that clone a voice from a few seconds of audio or swap a face onto a live webcam feed are now cheap, fast, and good enough to pass a routine Zoom or Teams call. In this session, Zohaib Ahmed and Will Krispin walk through where this shows up most at work, the meeting, the phone call, the receipt or document, and the hiring pipeline, and what teams can actually do about it.

TLDR: the majority of deepfake attacks happening today are inside your meeting environment. Watch the full recording, view the slides here, and check out the recap below.

Key takeaways

Most deepfake attacks now land inside everyday meeting tools, not exotic channels
Fraud has gone cheap and fast: open-source versions of new generative models tend to appear one to three months after the proprietary ones, dropping cost from thousands of dollars to cents and setup time from hours to seconds
People cannot reliably spot deepfakes, even when told to look, human accuracy is close to a coin toss and worse when nobody is expecting a fake
A convincing voice clone can be built from as little as 10 seconds of audio, then used in real-time calls, agent-driven call campaigns, or fake voicemails
Hiring is now a live target: one organization found it may have hired up to five people in a single quarter who were not who they claimed to be
Real-time detection needs only about four seconds of input and flags on the call in milliseconds, with a forensic report for the security team, rather than catching it after the loss
Physical "liveness" tricks like asking someone to turn sideways or wave a hand across their face work today, but they degrade as models improve week over week, so they are stopgaps, not a strategy

In this recording

1:14 The threat landscape: incident tracking, losses, and why deepfakes topped Gartner's list
6:04 Live fraud demos: fake receipts, damaged-item refunds, forged documents
8:24 Cloning a real voice from seconds of audio
10:39 Where the attacks happen: the full deepfake attack surface
13:17 Live communication attack types, from executive impersonation to hiring fraud
15:06 The tools attackers use: face swapping, voice cloning, replay attacks
16:28 Real deepfake meetings, including the now-viral "three fingers" clip
19:19 How to spot a deepfake on a live call
21:43 Real-time detection and the Resemble stack
31:36 Q&A and demo

What this webinar covers

How deepfake video call scams work, and why a live meeting is now the main attack surface
Why real-time face swapping defeats the "just get on a video call to verify" instinct
How voice-cloning vishing turns a familiar voice into an authorization tool
Where deepfake fraud is entering the hiring process through fake candidates
How AI image tools are used to fake receipts, IDs, insurance claims, and other documents
Practical signs, verification habits, and detection controls that reduce the risk

Deepfake video call fraud: the meeting is the main target

For years the standard advice for a suspicious email or wire request was simple, get the person on a video call and confirm. That instinct no longer holds. As Zohaib puts it in the session, the majority of deepfake attacks today happen inside the meeting environment, the same Zoom, Teams, and web conferences your teams live in every day.

The webinar shows a now-viral clip where the only reason a participant grew suspicious was a mismatched accent, and even then the suspected deepfake refused a simple challenge to hold three fingers in front of its face. A second, more advanced example shows a live face swap tracking a real person's head movements convincingly enough to pass. The takeaway is that seeing a face is no longer proof of who is on the call.

Publicly reported cases show the stakes. In the widely covered Hong Kong incident, a finance employee at engineering firm Arup authorized roughly 25 million dollars in transfers after joining a video conference where the CFO and every other attendee was a real-time deepfake. Verification has to move to something an attacker cannot fake in the moment, which is where out-of-band confirmation and real-time detection come in below.

Real-time face swapping: why "hop on a call" stopped being proof

Real-time face swapping is the engine behind those meeting attacks, and it is improving on a timescale of weeks, not months. What used to require a rendering pipeline and obvious artifacts now runs live and integrates directly into the conferencing and messaging apps everyone already uses, so nothing looks out of place to the victim. In the session, face swapping is cited as about 18 percent of deepfake attacks in 2025, a share expected to grow as the tools get cheaper and more reliable.

The important shift is that face-swap software has gotten good enough to slip past simple liveness checks that only look for basic movement. A synthetic face can nod, blink, and turn slightly while staying locked to the attacker's expressions. That means the defensive question is no longer "is this person moving," it is "is this video stream synthetic," which is a detection problem rather than a common-sense one.

There are still tells when you know to look, and the session covers them in the "how to spot" section below.

Deepfake vishing: a familiar voice as an authorization tool

Voice phishing, or vishing, is the phone-call cousin of the meeting attack, and it is often the cheapest to run. In the webinar, Will clones a recognizable executive voice from about 10 seconds of audio, then notes the same clone could drive an autonomous agent that calls employees at scale, or leaves a fake voicemail for a family member. Because people process a familiar voice faster than they question it, a cloned voice under time pressure can push someone to act before skepticism catches up.

The pattern is consistent. A finance or operations employee gets a call that sounds like the CFO, a vendor, or IT, and the ask is an urgent wire transfer, a payment reroute, or a credential reset. Voice deepfakes also pair with business email compromise, where a spoofed email sets up the story and the follow-up call closes it. This is not new in kind, an early high-profile case saw a UK energy firm defrauded of about 220,000 euros in 2019 on a cloned executive voice alone, but it is now vastly cheaper and faster.

Because vishing has no visual component, the defense is procedural, no voice on the phone should be able to authorize money or access without a second, out-of-band confirmation on a channel the caller does not control.

Deepfakes in hiring: the fake candidate problem

The hiring pipeline is a quieter but fast-growing target. In the session, one organization is described realizing it may have hired up to five people in a single quarter who were not who they claimed to be, with motives ranging from corporate espionage to simply collecting multiple paychecks. The FBI logged nearly 700 AI-related employment complaints in 2025, and those are only the cases that were caught.

This matters because a job interview is a video call that most teams do not treat as a security event. The same real-time face-swap and voice-clone tooling used in finance fraud lets a single operator present as many different people, pass a live screen, and clear onboarding. For any organization hiring remotely, the interview is now part of the fraud surface, and identity verification deserves the same rigor as a payment approval.

How to spot a deepfake on a live call

The webinar offers a set of physical challenges that still trip up real-time face-swap models today. Will is explicit that these work now but may not in a few weeks, so treat them as prompts to verify, not as proof:

Ask the person to turn a full 90 degrees to the side, models still struggle to render a jawline accurately in profile, especially with facial hair or fast movement
Have them wave a hand across their face, which often breaks the rendering
Have them hold a hand between their face and a lamp, since projecting an accurate shadow in real time is still hard
Pass a random object between the face and the camera and watch for distortion in the stream
Ask them to stick out their tongue or press a finger into their cheek, that kind of real-face distortion still confuses many models

If several tells line up, do not treat it as a verdict, verify the person through another channel before acting on anything they ask for.

How Resemble detects deepfakes in real time

The reason detection matters is that it does not depend on an employee catching a tell under pressure. Humans, as the session notes, are close to a coin toss at spotting fakes even when warned, while a detection model can beat that by a wide margin.

Resemble packages its detection model to run inside the meeting itself, capturing face swaps, voice clones, and AI filters as the call happens. It needs only about four seconds of input, returns a result in milliseconds, and alerts on the call rather than in post-processing. When something is flagged, the security team gets a forensic report showing which frame was analyzed, what manipulation technique was involved, and the model's confidence. The model is built to stay resilient across different platforms, codecs, and network conditions, with cited accuracy between 96-98% across the meeting platforms it targets.

Detection is the top layer of a broader stack the webinar walks through: an identity layer that answers whose likeness is involved, Resemble Watermarker (built on the PerTh Multimodal model) that lets an organization embed an invisible watermark to declare ownership of content it publishes, a layer that classifies what kind of fraud is occurring, an explainable forensic layer, and agents that can verify and initiate takedowns. It deploys on cloud, on-premise, or fully air-gapped for sensitive environments.

How to defend against deepfake fraud at work

Detection is one layer. The session, echoing Gartner's guidance that no single measure stops this, points to a multi-layered approach:

Require out-of-band confirmation for any payment, payment change, or access request, using a known-good number or channel the requester did not provide on the call
Establish verification code words or callback procedures for high-value or urgent finance requests
Treat video interviews and remote onboarding as identity-verification events, not just conversations
Run awareness training that includes voice and video deepfakes, not only email phishing, so employees have met a synthetic voice or face before meeting one in the wild
Add deepfake detection to high-risk channels, contact centers, finance approvals, and identity verification, so a synthetic voice or video stream is flagged in real time rather than caught after the loss
Keep provenance in mind as regulation catches up, incoming rules such as the EU AI Act Article 50 transparency obligations point toward a future where AI-generated media is expected to be marked and identifiable

Frequently asked questions

What is deepfake fraud at work? Deepfake fraud at work is the use of AI-generated voice, video, or images to impersonate a trusted person, usually an executive, vendor, or job candidate, in order to authorize payments, extract data, or gain access. It most often shows up on video meetings, phone calls, and in remote hiring.

How do deepfake video call scams work? An attacker collects public images and audio of a target, builds a synthetic face and voice, then uses real-time face-swap software to replace their own webcam feed during a live meeting. The synthetic participant makes an urgent, plausible request, and the victim acts because they believe they are seeing and hearing a real colleague.

Can you really fake a face on a Zoom or Teams call in real time? Yes. Real-time face-swapping software runs live with low enough latency to hold a conversation, and it integrates with common conferencing and messaging apps. It is good enough to pass a routine call and can slip past basic liveness checks that only look for simple movement.

How much audio does it take to clone a voice? Very little. In the webinar, a recognizable executive voice is cloned from about 10 seconds of audio, and the clone can then be made to say anything, including in real-time calls and voicemails.

Can people tell the difference between a real and a deepfake voice or video? Usually not. Even when told to look for a fake, human accuracy is close to a coin toss, and it drops further when people are not expecting one, which is why detection has to be technical rather than left to employees.

What is deepfake vishing? Vishing is voice phishing, and deepfake vishing adds a cloned voice to the call. A short audio sample is enough to build a convincing clone of an executive or vendor, which attackers use to pressure employees into wire transfers, payment reroutes, or credential resets over the phone.

How do companies end up hiring deepfake candidates? Candidates use real-time face-swap and voice-clone tools during remote video interviews to present a synthetic identity that passes a live screen. Because most teams do not treat interviews as a security checkpoint, the fake candidate can clear screening and onboarding and gain internal access.

How can employees spot a deepfake on a live video call? Look for a face that is sharper or better lit than its surroundings, edge artifacts around the hairline during movement, and lip-sync drift. Asking the person to turn fully to the side, wave a hand across their face, or press a finger into their cheek still breaks many real-time swaps, though these tricks weaken as models improve.

How can businesses prevent deepfake fraud? Require out-of-band confirmation for money and access requests, set up callback procedures and code words, treat hiring as an identity-verification event, train employees on voice and video deepfakes, and add real-time deepfake detection to high-risk channels.

Does deepfake detection work in real time on calls? Yes. Detection can analyze audio and video streams as a call happens, needing only a few seconds of input and returning a result in milliseconds, which is why it suits contact centers, finance approvals, and identity verification, where catching a synthetic voice or face in the moment prevents the loss.

Try Resemble AI free

Generate with confidence. Verify ownership. Detect deception. Only with Resemble AI.

Get started