The Emerging Threat of Political Deepfakes
A recent incident has thrust the dangers of deepfake technology into the spotlight again. An audio clip surfaced on Twitter last week allegedly capturing British opposition leader Sir Keir Starmer swearing at staffers. But evidence suggests the recording was an AI-generated deepfake, not an authentic leak. This event highlights the growing threat that deepfakes pose to the political world, and society more broadly.
What Are Deepfakes?
Deepfakes leverage AI techniques like machine learning and neural networks to fabricate audio or video that falsely depict people saying or doing things they never actually did. During the machine learning process (ML), the AI model trains on hours of authentic audio data of the target person to learn their facial animation, speech patterns, and vocal nuances. It then uses this data to generate new synthetic media that realistically impersonate the target person.
How Audio Deepfakes Are Created Using Text to Speech
Now that we have a general understanding of how machine learning is the foundation of deepfake creation, we’ll focus in on how audio deepfakes of Keir Starmer were likely created. The accessibility to voice AI generators, and AI voice changers has given individuals access to voice cloning. Users can scrape the Labour Party leader’s audio data from the internet and upload that data into a voice AI generator where they can clone his voice. Once the voice cloning is complete, the user is able to generate AI voice content through text to speech (TTS) or speech to speech (STS) conversion. Below is a diagram of text to speech synthesis.
Voice AI Generator and Open Source Accessibility
Although deep learning expertise was needed originally, deepfake generation has become highly accessible. User-friendly apps like FakeApp and DeepFaceLab enable anyone to create an AI generated celebrity face swap. Open source libraries like Python’s Keras or TensorFlow lower the barrier for programming custom deepfake models. And not to mention, there are websites that will generate deepfake content for paying customers.
The proliferation of audio deepfakes has been accelerated by text-to-speech engines. These allow users to create a custom AI voice model of a prominent public figure like Starmer by uploading a few minutes of his audio samples. The TTS apps then synthesize the cloned voice saying anything through the text input. This simplicity means skilled AI knowledge is no longer required to deepfake a person’s speech convincingly.
The Dangers Associated to Political Deepfakes
A prime example of this accessibility to AI technology recently emerged in British politics. The audio clip that was posted to Twitter allegedly captured opposition leader Sir Keir Starmer swearing and berating a staff member. However, evidence soon indicated the recording was an AI-generated deepfake produced without Starmer’s knowledge or consent. The clip exhibited subtle technical queues of fabrication. The speech patterns sounded slightly unnatural, with odd pauses and emphasis.
In addition, the Twitter account that posted the clip also had a history of spreading unsubstantiated claims about Keir Starmer. Despite this background, the incriminating deepfake still gained traction. Millions of listens likely left many viewers with a newly negative impression of Starmer.
Resemble Detect’s Deepfake Analysis of The Clip
Consequently, public access to the deepfake audio clip provided us the opportunity to analyze the clip with our deepfake detector, Resemble Detect. Below is a live demo of Keir Starmer’s audio being analyzed by Detect.
Resemble Detect’s deepfake detection model at work analyzing Keir Starmer’s audio.
Within seconds the real-time AI voice detector gave a resounding positive prediction of 100% by the deepfake detector. Represented below is the deep neural network’s analysis of Starmer’s deepfake audio file. The AI model analyzes the audio data in 2-second increments represented on the x-axis. The y-axis determines its probability with the bold red line at 1.00 or 100%.
Resemble AI Takes A Stand For The Democratic Process
As the political spectrum braces for more deepfake content, this event encapsulates the perfect storm that political deepfakes can create. The ambiguity over the origin of the content coupled with its inflammatory nature becomes a powder keg for misinformation. If deployed strategically before an election this could unfairly sway the outcome. Even if disproven later, the damage can’t be undone, undermining democratic principles. However, at Resemble our ML team continues to prioritize the research of effective deepfake detection techniques to stay ahead of deepfake audio technology.