How Game Developers can create synthetic speech using our Open Source Unity Plugin

May 14, 2020

We’re proud to release and open source our Unity plugin that enables game developers to create and embed speech content within a familiar workflow.

Resemble’s Unity plugin extends Resemble Clone; a product allowing users to record a few sentences in their own voice and immediately generate high-quality samples. With just three minutes of audio data, Resemble Clone has the ability to create the speech track for immersive VR experiences, digital characters, or Alexa Skills. Resemble solves the large creativity problem surrounding game developers when crafting speech content.

Creating speech content right within the Unity Editor

How Resemble’s Unity Plugin Works

Developers are able to sign up for an account on Resemble’s web platform and create their own voices
Resemble’s free Unity plugin is open sourced and can be installed directly from Github: https://github.com/resemble-ai/resemble-unity-text-to-speech
With a built-in editor – developers can quickly add content through the GUI within Unity
Developers can also directly generate content within scripts
Using the editor, users can tweak speech style and emotion to the word-level by highlighting and applying various emotions to the text.
Unity developers can access pre-built emotions right within their editor and can import any audio that they create on the web platform. You can find out more information on the documentation page: https://www.resemble.ai/unity-docs/

Adding Style with Speech Gradients

Resemble has also enhanced the Resemble Clone by introducing Speech Gradients. Through the editor, users can modify the intonations and inflections of speech down to the word making it possible to carefully craft speech content – with the flexibility that all creatives require. By interpolating between emotions, Resemble enables the creation of extremely granular speech content.

How synthetic voices will change Game Development

Since the early days of gaming, users have continually sought to be increasingly submerged in the fantasy experience of the game. For years, developers have focused their attention on ways to transition the user experience from controlling the game, to actually being a part of it. Realistic storylines in RPG-games, live roster updates in sporting games, and the ability to communicate in real-time with fellow players online have been some of the advancements in immersing users further into their gameplay experience.

In virtually all video games today, voiceovers play a crucial role in the quality and authenticity of the gameplay experience. Gamers growing up in the 90s may recall playing “NBA JAM” with their friends, which contained a few somewhat tacky, pre-programmed, and repetitive commentary lines. Sports games such as the popular NBA 2K series today have brought users much closer to the in-game action with personalized characters, storylines, player interviews, and dynamic commentary. What goes into creating such a vibrant user experience? At the present time, voice technology in games is actually quite straightforward. While developers have continued to fine-tune other aspects of gameplay over the years, voice technology in games has still been relatively primitive. Voice actors are hired by developers and provided a script for their roles. After recording, the audio is edited and implemented into game scenes and cues accordingly. While this has provided some pretty exciting and realistic experiences for users, the process is timely, tedious, and overall the content is still limited by what was materially recorded by the voice actors.

Resemble aims to diversify the possibilities of voiceovers in games, while streamlining the development process along the way. With the implementation of this technology, customized voiceovers would be able to be generated with a relatively small sample of an actor’s recorded voice. In some cases, a voice actor may not be required at all as voices may be generated without cloning a pre-recorded voice. Moreover, the possibilities of emotion, tone, and even content would be virtually endless. The benefits of this synthetic voice technology would be seen across the gaming industry, from developers eliminating the need for large-scale recording studios and lengthy recording sessions, to users being presented with dynamic in-game content that can be readily updated.

Examples of how this would enhance the user experience include RPG characters having fine-tuned human-like voices, character dialogue with realistic emotional overlay, and language diversity amongst and within characters. In order for users to experience these enhancements during gameplay, the current method would require a voice actor fluent in multiple languages at a professional level and perhaps multiple recording sessions for each language and emotion. Even after this lengthy process, the voiceovers would be limited to the content physically recorded. With the use of synthetic voice technology, a small sample of the actor’s voice could be tailored and fine-tuned for each language and emotion that suits the in-game character’s personality and their unique set of circumstances within the game. Furthermore, the possibilities for dialogue would be virtually endless and would allow for a broader array of experiences for the users.

While there is still some work to be done in the field of synthetic voice technology, we have certainly come a long way from the days of video games with little to no dialogue. The ability to bring characters to life in video games has helped expand the possibilities of gaming over the years and has assisted in the evolution of the gaming industry as a whole. With Resemble, developers and players alike have a lot to be excited about for the future of gaming.

More From This Category

Introducing Real-Time Deepfake Detection for Google Meet: Ensuring Authenticity in Video Conferences

Apr 9, 2024

We’re excited to announce the launch of Resemble Detect for Google Meet. Resemble Detect for Google Meet provides users with real-time deepfake detection to safeguard the integrity of conversations and enhance confidence in the authenticity of every voice in your...

Introducing Rapid Voice Cloning: Create AI Voices in Seconds

Apr 3, 2024

We're excited to announce the launch of our groundbreaking new feature: Rapid Voice Cloning. This innovative technology allows you to create high-quality voice clones faster and easier than ever before, unlocking new possibilities for your voice-enabled projects....

Introducing Resemble Enhance: Open Source Speech Super Resolution AI Model

Dec 14, 2023

Open-Source AI-Powered Speech Enhancement In digital audio technology, the necessity for crystal clear sound quality is paramount, however achieving pristine sound quality has remained a consistent challenge. Background noise, distortions, and bandwidth limitations...