Introducing the Deepfake Detection Dashboard

Q

How Resemble AI Created Andy Warhol Docu-series Narration Using 3 Minutes of Original Voice Recordings

Mar 9, 2022

The elusive words of pop art icon Andy Warhol will come to life for the first time in the March 9 Netflix premiere of The Andy Warhol Diaries, using Resemble AI’s generative voice technology.

“I felt that the AI voice would honor two hallmarks of Andy’s life and artistic practice, stemming from his desire ‘to be a machine,’” says director Andrew Rossi in Entertainment Weekly. “Andy admired the fact that ‘machines have less problems,’ saying that, ‘I do have feelings, but I wish I didn’t.’ He even had himself made into a robot and a hologram during his lifetime, and he said, ‘the reason I’m painting this way is that I want to be a machine.’ I thought that cloning Andy’s voice could function like a Warholian portrait, and the [Andy Warhol] Foundation approved.”

(Consent is a requirement for all Resemble AI projects and transparency is maintained throughout the process, as seen in this video about how Resemble AI voice cloning works.)

Andy Warhol’s voice was crafted to the performance and requirements of the Emmy-nominated director using Resemble AI’s synthetic speech engine. Rossi and his team were able to quickly tune and make various iterations of each line in seconds. Using Resemble AI’s web platform, the team made adjustments for emotion and pitch to Andy Warhol’s AI voice:

How Was This Made Possible?
Since most of Andy Warhol’s audio recordings are archived from the 1970s and 80s, there isn’t an abundance of audio data available. After sifting through all of the data, Resemble AI accumulated just 3 minutes and 12 seconds of usable data.

Creating A Voice Model From 3 Minutes

Resemble AI’s proprietary Deep Learning models are able to recreate voices with minimal data. With a large foundational model, and a modern Deep Learning architecture, Resemble AI’s model is able to adapt to new voices with just a handful of samples.

Through Resemble AI’s neural data pipeline, Andy’s data was cleaned and normalized. After uploading a dataset, Resemble AI’s automated pipeline extracts various features and computes numerous metrics to filter the parts of the dataset for training. Common with other machine learning pipelines, the data that is inputted has a significant impact on how the output is constructed.

Resemble AI exposes the results of the analysis back to the user so that an attempt can be made to rectify as much data as possible.

Adding Performance With Tunable Knobs
Once Andy Warhol’s voice model was ready to be consumed, the creative team simply imported all of the lines from “The Andy Warhol Diaries” into the web-based editor. This enabled them to create a baseline of how the AI would predict a sentence.

Using the intuitive web authoring tool, the creative team behind the docu-series could go in and tweak the output to their liking. This could be anything from slowing down portions of the delivery, to creating rising or falling inflections.

A view of Resemble AI’s emotion editor used to create specific styles and tones for delivery.

Adding Final Touches With Style Transfer

Although the knobs were satisfactory for the creation of some lines, they weren’t enough to get the delivery exactly the way that the creative team wanted it. Some of the lines needed further tuning and tweaking to get the right emphasis and pronunciation.

This is where Resemble AI’s advanced style transfer technique came into play. With this, the creative team could get reference audio clips of another speaker delivering the sentence and get the output in Andy Warhol’s voice. This amount of flexibility increased productivity, and naturally allowed the creative team to add human-like imperfections to the output which made it far more engaging.

The Future of Entertainment
Generative audio is one of the most incredible and underutilized areas of AI. It has the ability to change the way that we create and interact with content. It opens up new possibilities for entertainment and storytelling.

Resemble’s Voice AI platform makes it possible to create entire movies, TV shows, and video games with AI-generated voices. This will allow for more creative freedom and new forms of expression—all rooted in consent and transparency.

Resemble AI’s technology is being used by some of the largest media companies in the world to create content that was previously impossible. Whether it’s transferring a voice into dozens of other languages, creating thousands of dynamic personalized messages from celebrities, or creating unique real-time conversational agents, Resemble AI is changing how content is created.

With Resemble AI, creating engaging and high-quality voice content is now easier than ever, enabling content creators to add a whole new level of authenticity to their work, and will add a new level of immersion for the audience.

To learn more about how Resemble AI maintains ethical standards across the industry, visit our Ethics page.

More From This Category

How Resemble AI’s Custom TTS Enhances Open AI GPT Assistants

How Resemble AI’s Custom TTS Enhances Open AI GPT Assistants

The introduction of Open AI's Text-to-Speech (TTS) API has changed the synthetic voice generation game, marking the dawn of tailored text-to-speech applications. As companies demand better voice synthesis for various uses, from creating content to interactive agents,...

read more