Finding your Synthetic Voice!

Video Info - 5 , 83

Overview

It all started with the cost and inefficiency of scheduling actors to re-record various parts of a movie dialogue that had problems such as interference from ambient sounds. It was a tedious process to re-record parts of a dialogue tracks and then to carefully replace the original dialogue in a movie, a process known as ADR, automated dialogue replacement.

Fast forward to the age when complete movies are based on animation and major scenes in actor-based movies are created from a virtual set, creating those amazing Star Wars battle scenes. The next logical question was: why are we limited to actual human voices in the dialogue?

Listen to the Podcast

The emergence of AI is powering a new approach to analyzing voices, identifying and tagging speaking patterns by age, gender, education, cultural heritage, and accent to fix audio tracks in movies. This podcast discusses the Company’s technology to create a synthesized voice that is designed to fit the character in a movie, animation or audio dialogue.
Here two examples of a synthesized voice

Our conversation reviewed one of the major challenges in casting actors in a movie: finding actors who have both the image and the voice to convincingly represent the character in the story’s setting. Resemble AI did much of this early work, and recognized that building a database of recorded voices would enable the company to both replicate existing voices as well as create new synthetic voices that could be configured to fit the setting in a movie. This technology provides the ability to create a voice configured by a specific age, gender, education, profession all within a specific cultural background, complete with a regional accent.

Movie casting can usually get the person’s image right, but getting the voice right is another challenge. Casting directors can usually find an actor with the right image and acting style, but that same actor must also have the right voice, in terms of accent, cadence, style, etc, or the ability to replicate a defined speech pattern.

The rise of the synthetic, customized voice technology also provides the means to replicate the voice of a famous person making a statement, the content is entered through the keyboard. This complete voice replication feature is one area discussed that is open to supporting the fake information age, fake news, deep fakes, fake voices.

Key Concepts

The movie entertainment business has does continue to experience waves of disruption. In the past, movie studios controlled everything from approving the budget for a film and moving the film into production, to all areas of casting, production and final distribution. Today, the ability to create virtual movies, that is movies created by computer graphics and computer vision has changed the nature of movie and video production.

It’s easy to overlook the necessity of casting the right voice. Voice is critical to the credibility and authenticity of the character and their role to fit into the overall story. Have you ever noticed when an American actor is not credible in mimicking the accent of a character in film about 18th century Europe?

One area that has been far behind but it suddenly catching up is audio voice technology. Until recently, the old-school approach of recording actor with a video camera, enhancing the audio with a variety of audio tools such as “equalizers” and fixing low quality audio with dialogue replacement was an expensive and inefficient procedure, but it was a standard practice.

This synthetic voice technology does pose some serious threats to a free speech society. Being held accountable to past statements my no longer be relevant if we cannot ensure the integrity of a recorded audio statement, with or without the video. As was suggested in the video and podcast, Resemble AI may need to add a certification technology to verify that an audio file does represent the person representing that voice.