Join 63,097 users and earn money for participation
read.cash is a platform where you could earn money (total earned by users so far: $ 318,664.24).
You could get tips for writing articles and comments, which are paid in Bitcoin Cash (BCH) cryptocurrency,
which can be spent on the Internet or converted to your local money.
Takes one minute, no documents required
Internet Gold | Can You Trust Your Ears? Two Minute Papers (Pt 2 of 2)
This article is part 2 of my 2-part series about voice audio manipulation. You can find part 1 here.
In part 1 I discussed several methods of faking voice recordings. However, part 1 was just the appetizer for our main course -- which is this article. In part 1 I intentionally left one particular voice manipulation method out. The most powerful of them all -- and to be honest: It is the most scary one, too.
Arnold Schwarzenegger enacted a reprogrammed terminator in that movie. A terminator is a humanoid robot which is usually used for terminating people in a dystopian future. The terminator portrayed by Arnold was reprogrammed and traveled back in time to save John, a young boy, from a more advanced T-1000 terminator robot which was sent back in time in order to kill that same boy.
What happens in that scene is that the more advanced T-1000 shape-shifted into the boy's foster mother. The foster father does not realize that and after he says something that is annoying the shape-shifted terminator, it terminates him... The "Arnold terminator" is with the boy in the phone booth. The shape-shifting terminator is not aware that Arnold traveled back in time as well. Arnold speaks in the boy's voice and uses a fake name for the boy's dog. If Arnold talked to John's foster mother, she would notice that discrepancy. However, the T-1000 apparently does not know the dog's name and is unaware of the trap Arnold laid out for it.
The two terminator robots have a conversation over the phone. The T-1000 pretends to be the boy's foster mother, Arnold pretends to be the boy. How do they try to fool each other? By imitating the corresponding human voice. In fact, they do it so perfectly that nobody can tell whether they are speaking to the real human or to a robot.
The shape-shifting T-1000 is not only capable of taking on the appearance of pretty much any solid of a certain size, it can also generate all possible human voices. It does not only match the tone of the voice, but it also imitates little peculiarities of how a person speaks. Arnold's terminator is an earlier model and is not capable of shape-shifting, but the voice generation part was already mastered at the time of his creation (or maybe that feature was added via an over-the-air-update -- who knows 🤷♂️).
The point is: In 1991, when the movie Terminator 2 came out, realistic computer voice generation was just as much science-fiction as shape-shifting.
What do the terminators need in order to imitate a person's voice perfectly? A few seconds of hearing that person speak is enough. In that science-fiction movie from 1991.
Let's compare to real science from 2019.
The result: About 5 seconds of voice recording is enough to imitate a person's voice and way of speaking almost perfectly.
How crazy is that? 1991's science fiction became pretty much a reality 28 years later!
And keep in mind that this result is the worse, computer generated voice audio will ever be. (It can only get better in the future.) And this paper is almost one year old, so there are probably even more capable AIs available that run circles around this one...
There might be, but it has already become very hard for a human to decide what is real and what is not. Our best bet is probably to use an AI to distinguish between artificial voice output and a real person speaking.
But there is a catch: With every improvement in detection, an improvement in creation usually follows closely. The reason is that you can use a detection AI to "train" (improve) a generating AI. It's a vicious circle. In the future, it will just be even harder for humans to decide what is real and what is not... Maybe, it will be impossible to tell real and fake voices apart at some point -- just like in the movie.
Good idea, but have you heard of deepfake? That is the topic for the next installment of this Internet Gold series... And what we discuss there might shock you. You might want to sit down for that one. But don't worry, not today. I will probably finish writing the article tomorrow. So stay tuned!
Have you heard about The Comment League which fellow author @Macronald wants to establish starting November 2020? The goal is to increase meaningful discourse and reader interaction by rewarding those readers who write high-quality comments. Check it out! 😊
If you are tech-savvy, navigate to this GitHub repository where you can download an unofficial implementation of the AI which was demonstrated in the above video. You can install the software by following the steps (it is not super hard, but also not beginner friendly). Once installed, you can use a short recording of your own voice to train the AI. If everything works out, your computer will speak exactly like you. How cool is that?!