What is Deepfake Vishing: Strategies for Prevention and Response

Vishing, a combination of ‘voice’ and ‘phishing,’ is an increasingly sophisticated form of cybercrime where fraudsters use telephone systems to deceive individuals into revealing sensitive information, such as bank details or personal identification numbers. This deceptive practice leverages the inherent trust people place in phone communications, and with the advent of deepfake technology, vishing has evolved into a more formidable threat.

The basic mechanism of vishing involves scammers impersonating representatives from legitimate organizations like banks, government agencies, or well-known companies. Utilizing deepfake technology, these fraudsters can now create highly convincing audio simulations, making their deceptions more believable than ever. For instance, a scammer might use a deepfake audio clip that mimics a familiar voice, such as a company executive or a family member, to persuade victims into sharing confidential information or transferring funds.

Recent years have seen a notable rise in vishing incidents, thanks in part to the proliferation of deepfake and voice spoofing technologies. As reported by the Federal Trade Commission, thousands of vishing cases are registered annually, leading to significant financial losses. The real extent of this problem is likely greater, with many cases going unreported.

Examples of Vishing Attacks

The statistics are clear: vishing attacks have grown 550% in 12 months, and nearly 7 out of 10 IT professionals reported having received a vishing call in 2021, a 54% increase since 2020.

Several high-profile cases highlight the threat of vishing. For instance, MGM Resorts fell victim to a vishing attack that cost the Las Vegas casino giant about $100 million. The attackers used sophisticated tools for spoofing phone numbers, manipulating caller ID information, and impersonating callers, demonstrating the increasing technological advances that contribute to the prevalence of vishing attacks.

Another case involved a vishing attack on IT departments, where the attackers posed as clients or fellow employees needing help. Using social engineering techniques, they were able to obtain sensitive information such as employee IDs, passwords, and other critical infrastructure details. This complex operation involved different levels of deceit and manipulation, leading to significant financial losses.

Examples of Deepfake Vishing

Deepfakes, which use artificial intelligence to create hyper-realistic but fake audio or video content, have been increasingly used in vishing attacks. Here are a few examples:

CEO Scam: In one of the first known instances of a voice deepfake used in a scam, the CEO of a UK-based energy firm was tricked into transferring approximately $243,000 to a Hungarian supplier. The fraudsters used AI-generated voice deepfake technology to mimic the voice of the CEO of the parent company, convincing the victim that he was speaking with his boss.

Cryptocurrency Specialist Targeted: In another case, a customer sought to target a cryptocurrency specialist using a multistage social engineering attack. The attackers used deepfake technology to alter or clone voices in real time, creating an artificial simulation of the person’s voice. This made the impersonation attempts appear credible, enabling the attackers to successfully carry out their scheme

Deepfake Call Centers: There are also instances where entire call centers are dedicated to vishing, using deepfake technology to clone voices and carry out large-scale attacks. For example, a call center was discovered in Ukraine where deepfake technology was used for malicious vishing attacks

These examples highlight the growing threat of vishing attacks that leverage deepfake technology. As this technology becomes more sophisticated and accessible, the risk of falling victim to such attacks increases, underscoring the need for effective detection and prevention measures.

How to avoid Vishing with Resemble Detect

Resemble Detect, developed by Resemble AI, is a state-of-the-art neural model designed to expose deepfake audio in real-time. It works across all types of media and against all modern state-of-the-art speech synthesis solutions. By analyzing audio frame-by-frame, it can accurately identify and flag any artificially generated or modified audio content.

Deepfake Detection results for Joe Biden Robocall
Deepfake Detection results for Joe Biden Robocall

Resemble Detect uses a sophisticated deep neural network that is trained to distinguish real audio. It exposes knobs and control operations to ensure that every application and requirement is met. Users can control the granularity of the analysis, sensitivity towards false-positives, and vocal isolation to remove noise. The process involves configuring detection settings to match specific security needs and uploading audio for analysis.

Resemble Detect is essentially a sophisticated AI ear that listens for the very subtle sonic artifacts inherent in any manipulated audio. Regardless of how the sound is adjusted, those clues remain, and Resemble Detect can use them to assess the likelihood that the audio is a deepfake.

Resemble AI has always prioritized safety and ethics when building generative AI. The company has introduced tools like the AI Watermarker to protect data from being used by unauthorized AI models. By watermarking data, users can verify if an AI model used their data during its training phase.

Resemble AI’s goal is to empower customers with an enterprise-grade solution to recognize voice deepfakes. The Deepfake Detection dashboard highlights includes scalability, accuracy, reliability, and voice isolation capabilities as key features.

In the context of vishing attacks, Resemble Detect can play a crucial role in identifying and preventing fraudulent activities. As deepfake technology becomes more sophisticated and accessible, tools like Resemble Detect become increasingly important in safeguarding individuals and organizations from falling victim to such attacks.

More Related to This

Introducing State-of-the-Art in Multimodal Deepfake Detection

Introducing State-of-the-Art in Multimodal Deepfake Detection

Today, we present our research on Multimodal Deepfake Detection, expanding our industry-leading deepfake detection platform to support image and video analysis. Our approach builds on our established audio detection system to deliver comprehensive protection across...

read more
Top Speech-to-Text APIs, Open Source Models and Systems

Top Speech-to-Text APIs, Open Source Models and Systems

From whispers to shout-outs, our words hold power, and speech-to-text technology captures that energy like never before. But we’re not talking about basic transcription; today’s top speech-to-text APIs and open-source models are built to interpret context tone and...

read more
Introducing ‘Edit’ by Resemble AI: Say No More Beeps

Introducing ‘Edit’ by Resemble AI: Say No More Beeps

In audio production, mistakes are inevitable. You’ve wrapped up a recording session, but then you notice a mispronounced word, an awkward pause, or a phrase that just doesn’t flow right. The frustration kicks in—do you re-record the whole segment, or do you spend...

read more