he podcast Radiolab referred to him as the “Sherlock Holmes of digital misdeeds.” As a computer scientist at Dartmouth, Dr. Hani Farid is often hired as an expert to determine whether photos and videos have been distorted. When asked in 2017 about what has come to be known as deepfake technology, software that allows people to make “fake videos…that look and sound just like the real thing,” Farid’s message was ominous. “Yes, you should [be terrified]…the ability of technology to manipulate and alter reality is growing at a breakneck speed.”
At an Adobe Max event in 2016, Adobe demonstrated a prototype of a new video-editing software they referred to as VoCo. The software allows users to edit speech by overtyping a transcript of a given video. Though Adobe has yet to even announce the commercial release date for VoCo, the November 2016 demo sparked much controversy. Experts warned that it could pose security threats and further reduce trust in journalism. Though deepfake videos have only existed for a few short years, machine learning has rapidly advanced since the first deepfake videos popped up on Reddit in 2017. And in 2019, machine learning technology can produce videos and images that extend beyond the wildest nightmares of what VoCo’s critics could have predicted in 2016.
The first deepfakes built upon technology that allowed animators to lend human likeness to movie characters. When filming movies such as Toy Story or Avatar, animators used sensory markers to record actors’ expressions and lend their characters human likeness. Deepfake technology built on this concept, using a collection of pictures and videos to create fake videos. In 2016, researchers at the University of Erlangan-Nuremberg, the Max Planck Institute for Informatics, and Stanford University developed a video reenactment software known as Face2Face that allows people to use a webcam to modify existing videos through “real-time face tracking.” In other words, the program allows users to edit the words and expressions of people in other videos with words and expressions of their own. That following year, at the annual computer graphics conference known as Siggraph, researchers from the University of Washington, as well as the VISITEC institute in Thailand, presented a project entitled, “Synthesizing Obama: Learning Lip Sync From Audio.” In the video presented at the 2017 conference, a deepfake video of President Obama was formed to match a real audio recording. The researchers trained a computer on hours of Obama’s weekly addresses to create the video. The computer learned to map mouth shapes from just Obama’s audio files; the resulting mouth shapes were then grafted onto the head of a person from another video.
Later that same year, fake videos emerged on social media. A Reddit user going by the name “deepfakes” began sharing porn videos featuring celebrities like Gal Gadot and Taylor Swift that had been modified with machine-learning software based on open-source software libraries, such as Keras or TensorFlow. Reddit quickly caught on. The user deepfakes named a subreddit after himself, which gained over 15,000 subscribers in a two month period, and dozens of other Reddit users began making and posting deepfake pornography.
By 2018, deepfake technology had exploded. Another Reddit user, deepfakesapp, created FakeApp, a smartphone application that allows anyone, not just the tech savvy, to create deepfake videos. By August 2018, Stanford researchers innovating machine learning technology had debuted a technology known as Deep Video Portraits at the Siggraph conference. Deep Video Portraits can render an entire synthetic person and background, and the technology allows users to swap faces or change a person’s actions “without explicitly modeling hair, body, or background.” Whereas earlier technologies, such as the one used to synthesize President Obama just a year earlier, modeled a head from another video to produce a deepfake, Deep Video Portrait’s powerful AI “takes care of everything by itself.”
Following her first story covering the deep fake phenomenon, Vice writer Samantha Cole spoke with a computer scientist, Peter Eckersley, about the future of deepfake technology. In January 2018, following the November 2017 release of Reddit user deepfakes’ fake pornography videos on Reddit, Eckersley’s predictions were modest. He argued that while a closer look at 2017 deepfakes revealed that they were fake videos, he predicted that within “a year or two” the production would be more advanced. Cole noted that this prediction didn’t even hold for two months. Soon after the release of user deepfakes’ first videos, similar fake porn videos were popping up all over Reddit.
Through one-shot learning, researchers were able to animate famous portraits like the Mona Lisa. This is a giant leap ahead for machine learning technology, which previously relied on a whole catalogue of images to animate the subject at hand.
2019’s deepfake technology has moved at an even faster pace. In May 2019, researchers at Samsung’s AI lab in Moscow unveiled “one-shot learning,” a process by which they taught a computer to make “living photographs” from just a single image. To do this, Samsung researchers trained their algorithm on a large set of talking-head videos to generate realistic facial expressions from single images. Through one-shot learning, researchers were able to animate famous portraits like the Mona Lisa. This is a giant leap ahead for machine learning technology, which previously relied on a whole catalogue of images to animate the subject at hand. And at this year’s Siggraph conference in July, researchers from Princeton, Stanford, and the Max Planck Institute of Informatics will present a speech editing study they conducted using Adobe VoCo.
But it’s mainstream news that has revealed what is perhaps deepfakes’ most impactful role to date. Recently, a fake video was released of U.S. House Speaker Nancy Pelosi which made it seem like she was drunk while giving a speech—to be clear, that was a cleverly edited video, not a deepfake. The Pelosi video was followed by a fake video of Facebook CEO Mark Zuckerberg—an artist’s clever response to Facebook’s refusal to take down the fake Pelosi video. In a country where “fake news” was already a problem before fake videos were technologically possible, the deepfake revolution ushers in frightening political prospects and a new era of civic discourse erosion. And with the technology developing faster than experts predicted just two years ago, it seems that machine learning manipulation is getting harder and harder to control. In light of the rapid progress in machine learning technology over the last three years, it seems wise to heed Dr. Farid’s 2017 warning.