Deepfakes: Big Tech Fights Back

Season 2 • Episode 6

Deepfakes, those computer-generated videos of well-known people saying things they never actually said, strike a lot of experts as terrifying. If we can’t even trust videos we see online, how does democracy stand a chance?

As photo- and video-manipulation apps get cheaper and better, the rise of fake Obamas, Trumps, and Ukrainian presidents seemed unstoppable. But then a coalition of 750 camera, software, news, and social-media companies got together to embrace an ingenious way to shut the deepfakers down—not by detecting when videos are fake, but by offering proof that they’re real.

Guests: Dana Rao, chief counsel and executive vice president of Adobe; Eric Horvitz, chief scientific officer, Microsoft.

Episode transcript

Intro

Theme begins.

Deepfakes are phony videos of real people, generated by artificial intelligence software to undermine our trust. These days, deepfakes are so realistic, that experts worry about what they’ll do to news and democracy.

“OBAMA:” We’re entering an era where our enemies can make it look like anyone is saying anything at any point in time.

It was all looking pretty hopeless—until a counterforce came together, made up of Adobe and Microsoft—and 750 of their closest corporate friends.

I’m David Pogue, and this is “Unsung Science.”

First Ad

Season 2, Episode 6. Deepfakes: Big Tech Fights Back.

If you were online in 2021, there’s a good chance you saw one of those viral Tom Cruise deepfakes:

Cruise deepfake clip. https://youtu.be/iyiOVUbsPcM?t=76

“CRUISE:” I’m gonna show you some magic. The real thing (laugh).

I mean it’s a little dumb that I’m playing that for you, because what you heard there is actually not computer-generated. So far, most deepfakes are AI-generated fake video. And in that Tom Cruise deepfake, the voice isn’t synthetic. The Tom Cruise voice there is an actor. I mean, a different actor. Not Tom Cruise. I really hope this makes sense.

Anyway.

If you had the good taste to listen to season 1 of “Unsung Science,” you may remember an episode about the rise of voice deepfakes. Here’s a synthetic rendition of me, closing out that episode:

DPFAKE: Well, in the end, voice synthesis is just another technology. What happens from here isn’t about the tool; it’s about whoever’s wielding it.

Now that’s what I call…a Dpfake! Get it? David Pogue, DP, my initials? Dpfake? Sounds kinda like deepfake?

OK.

Now, so far, most of the deepfake videos we’ve seen have clearly been intended for entertainment purposes. There was the Obama one…

OBAMA: Now, you see, I would never say these things. At least not in a public address.

A lot of Donald Trump ones…

TRUMP: I had no doubt I was going to win. No one was against me, the entire country’s with me, I have to tell you. Tremendous.

And here’s a decent Morgan Freeman…

MORGAN: What is your preception of reality? Is it the ability to capture, process, and make sense of the information our senses receive?

So those are just stunts. Just for fun. But every now and then, somebody makes a deepfake that’s really meant to trick us. Usually for political reasons.

For example, there was a deepfake of Ukrainian president Vladimir Zelenskyy, where he called on his armed forces to lay down their arms and surrender to Russia:

VLADIMIR: [use your favorite sentence from the clip]

And there was the time someone slowed down a video of speaker of the house Nancy Pelosi by 25%, to make it look like she was drunk:

“NANCY:” We want to give this president the opportunity to do something historic for this country.

Of course, that wasn’t technically a deepfake—no new audio or video was generated, and no AI was involved—but it was manipulated video, designed to mislead us for political goals.

As this Microsoft YouTube video points out,

HENRIQUE: Advances in computer graphics and machine learning have led to the general available of easy-to-use tools in modifying and synthesizing media. And that threatens to cast a doubt on the veracity of our media. At what point do we question everything we see and hear?

OK, so why don’t we just get a bunch of Big Tech geniuses to write software that can detect when a video is fake, and flag it?

Oh, trust me—they’ve tried! Adobe, Facebook, MIT, Microsoft, and the Department of Defense, among many others, have all tried to create deepfake detection software—and none of them really worked. At one point, Amazon, Facebook, Microsoft, and a bunch of researchers collaborated on a Deepfake Detection Challenge, where they gave away a million dollars to the winners of a contest to develop the best detection algorithm. The first-place winner managed to guess “real or fake” correctly—65% of the time. Wow man.

And yeah—detection software will get better. Unfortunately, so will the deepfakes themselves.

RAO: 15:03:28 / we met with– our image and AI scientists. / We said, “Can we detect this automatically?”

This is Dana Rao, chief counsel and executive vice president of Adobe.

RAO: 15:03:41 / The problem is the technology to detect AI is developing. And the prod– and the technology to edit AI is developing. And there’s always gonna be this horse race of which one wins.

15:04:13 / And it’d have to be at 100% accuracy rate or no one would believe it./

15:03:55 And so we know that for a long-term perspective, AI is not going to be the answer.

In other words, trying to write software that will detect a deepfake is hopeless. Sure, our detection software is going to get better, but meanwhile the deepfake-creating software will have gotten better better…and on and on we’ll go forever, as trust collapses and democracy dies.

Bring up closing music—we’re going to make listeners think is the end.

So I guess the bad guys win this time. Such a bummer! I’m David Pogue, and this is “Unsung Science.”

UNSUNG SCIENCE with David Pogue is presented by Simon & Schuster and CBS News, and produced by PRX Productions.…

End music

PSYCH! You didn’t really think I’d leave you depressed like that, did you?

Nope. This is not going to be another news item to bum you out. This, it turns out, is a good-news story.

It starts with Adobe’s engineers suddenly having a eureka moment: that it would not even try to detect deepfakes.

Rao: So / we flipped the problem on its head./ what we really need is to provide people a way to know what’s true, instead of trying to catch everything that’s false./

POGUE: 15:10:08 So you’re not out to develop technology that can prove that something’s a fake. This technology will prove that something’s for real.

RAO: 15:10:16 Absolutely. That’s exactly what we’re trying to do. / 15:17:04 It is a lie detector for photos and videos.

Here’s Adobe’s YouTube video about the idea:

NARR: When a photojournalist captures an image, they can choose to preserve its key facts. Like who shot it, where they were, and when it happened. Then, when the image appears on screens all around the world, its history moves with it. And if anything was changed along the way, everyone can see.

Now, Adobe’s a natural company to be worried about fake photos and videos—after all, they’re responsible for selling some of the most popular tools for editing photos and videos, like Photoshop, Premiere, and After Effects. Adobe called its project …the Content Authenticity Initiative. The CAI.

But here’s the twist: At the same time, unbeknownst to Adobe, right up the west coast, another company was working away at the same problem—with exactly the same approach.

POGUE: 12:35:05;06 Before we start, would you spell and pronounce your name for our records?

HORVITZ: 12:35:08;12 It’s Eric Horvitz. / Just two syllables to all the Horowitzes out there that want (LAUGH) me to be different. (LAUGH)

POGUE: 12:35:19;23 And you’re chief scientific officer?

HORVITZ: 12:35:21;24 Yes. Of Microsoft.

Horvitz is no stranger to tackling gigantic ugly problems in the digital world. He is also the co-inventor of the email spam filter.

POGUE: 12:37:42;24 You are?

HORVITZ: 12:37:43;16 Yeah. Yeah.

POGUE: 12:37:43;18 I didn’t know that. Wow. (LAUGH)

HORVITZ: 12:37:45;02 Yeah. Back in 1998.

POGUE: 12:37:46;17 Oh, my gosh.

He, too, had concluded that creating a deepfake-detecting AI would turn into a futile arms race.

Note: Tightening up his pauses would be most appreciated!

HORVITZ: 12:38:13;13 It’s kinda AI versus AI neck and neck. You use ‘em to create and to detect. And it’ll never be a reliable way to win.

12:38:42;19 And seeing the world going in that direction– I had to jump outside of AI for solutions. And I– I basically brought together several teams across Microsoft at– at the time. / to sit in front of white boards and figure out, like, “How can we solve this problem? Is there a way out?”

12:39:01;24 / what we came up with was a– way to use cryptography, distributed databases, notions of signing—content, / to certify the source and life history of a piece of media as it travels around/ such that consumers/ will get to see right away– what’s the source, origin, and history of edits– / to understand that– a piece of content comes from whom it says it comes from, as opposed to– having to guess.

POGUE: 12:41:22;22 / So (LAUGHTER) / attached to a picture or video is going to be this invisible certificate of authenticity and provenance. Is that a good way to describe it?

HORVITZ: 12:41:37;24 Absolutely. / think about it as– a– certification that /- a trusted source of the information– who published it– certifies exactly what you’re seeing. And if not, you can see who’s changed it. /

Microsoft refers to this invisible document as a manifest—as in “manifest,” the document that lists what’s on a ship or who’s on a plane. Sometimes you’ll hear it referred to as the provenance of a photo or video, as in “provenance,” the document that lists who’s owned a certain painting. Microsoft called its initiative—Project Origin.

Henrique: We’re forming a coalition of many institutions, including the BBC, the CBC Radio Canada, the NY Times, and Microsoft. We call this coalition, Project Origin.

So now we had two competing programs. Adobe with its Content Authenticity Initiative, and Microsoft with is its Project Origin. Well, that’s just what we need. Another format war, like VHS-Betamax. iPhone/Android. Playstation/Xbox. Mac/Windows. Coke/Pepsi.

Ah, but there’s a delicious twist. See, neither company plans to make money from their inventions. I heard that from Adobe…

RAO: 15:06:21 / This is an open standard. We founded this to be open so anyone can build this technology. / We’re not charging separately for it. No one is. This is something we’re all doing to come together for the common good.

…and from Microsoft.

HORVITZ: 12:50:04;24 /Microsoft has a– long tradition of being very interested in not just– technology, but socio-technical issues. The influence of technology on people and society. / As chief scientific officer, I oversee– a whole set of projects in the area of technical responsibility, including being responsible about advances in AI, artificial intelligence– and its influences on– on– on organizations, on people and society/.

And since this provenance business wasn’t going to be a profit center for either company, they did something amazing: They decided to work together.

HORVITZ: 13:08:31;19 Project Origin / coming together with the– the Content Authenticity Initiative, led by Adobe, bringing those teams together, forming a standards body– called C2PA, the Con– the Coalition for Content Provenance and Authenticity, generating a standard now being sucked in by all these companies who say, “My tools will talk to that standard.”

POGUE: 13:09:14;12 What’s really impressive is that you’re / mentioning software companies, Microsoft, Adobe who are, in some realms, competitors. You’re saying that they all laid down their arms to work together on something to save democracy.

HORVITZ: 13:09:33;12 Yeah. /we do see, I wouldn’t say necessarily competitors, but– groups working together across the larger ecosystem.

OK, I’d like to insert here a little side note about novelists: I hate it when writers give several of their characters names that start with the same letter. Like, across 500 pages, I’m supposed to remember who’s Caleb, Calvin, Caden, and Cameron?

Well, we’re gonna have a similar situation here. Adobe’s coalition was called the Content Authenticity Initiative, and now the joint venture with Microsoft is called the Coalition for Content Provenance and Authenticity. And to make matters worse, the actual manifest, the data that describes the history of this photo and video—is called? Are you ready? The Content Credentials.

Sigh.

Anyway. Dana Rao gave me a demo on his laptop to show me how it’ll work.

RAO: 15:52:37 Imagine you’re scrolling through your social feeds. The inevitable cat picture. And you see something you’re not quite sure of. Someone sent you a picture of– snowy pyramids. And they told you that the scientists found them in Antarctica. And that’s not what your fifth grade teacher told you.

So at this point, he’s showing me a website that’s very obviously Facebook—or, rather, very obviously supposed to look like Facebook. There’s the familiar blue banner across the top, except that in this case, instead of saying “Facebook,” it says “My Social Feed.” Oooh, clever.

One of the posts shows a photo of what looks like the three famous Egyptian pyramids—except they are now covered in snow.

When he mouses over the photo, we see a tiny icon appear in the upper right. It looks like a lowercase letter I, as in “info,” inside a little white circle.

RAO: 15:49:27 / And so you click on this button. C– it’ll s– it’s the Content Credentials. And you’ll get this information.

When he clicks it, a new window opens, with the snowy pyramids in the middle, and information panels on either side.

RAO: 15:49:42 So here you’re gonna be able to see on the left the image that you clicked on. And then you’re gonna see the original image there. /

15:49:58 And you’re gonna see exactly on the left what edits were made. AI tools were used. There were color adjustments that were made. Photoshop was used. You’re gonna see some information about when it was made and where it was made. In this case– Cairo, Egypt, not Antarctica, (LAUGHTER) shockingly.

Now, the photo has a vertical divider splitting it in half. It’s like an adjustable split screen, showing the same photo before and after it was manipulated. He’s dragging the slider left and right, so I can see more or less of the original photo.

RAO: 15:50:16 /And you’re just gonna be able to take this slider and you’re gonna be able to see the edits just like that.

15:53:19 / people are still fighting today about whether or not that iconic shot of Neil Armstrong was true or not. So imagine you had Content Credentials there. It’d say Buzz Aldren took it and it’d say it was taken in space and we wouldn’t be having this debate anymore. (LAUGHTER)

POGUE: 15:53:41 Sh– it would tell you what camera, right? The iPhone point-zero-zero-one! (LAUGHTER)

So that’s Content Credentials: the future of the fight against deepfakes. It’s brilliant, it’s un-fakeable, it’s futureproof—and, as I see it, it’s hopelessly flawed. In five different ways. We’ll get to that—after the break. Patience, grasshopper.

Break

OK: What we’ve established so far is that Microsoft and Adobe teamed up to ruin the lives of the scumbags who create deepfakes. And the way they plan to do that is by invisibly embedding, in every photo, video, and audio clip, a manifest—a document that shows where this bit of media came from, and how it’s been changed en route to your screen.

Seems like a rock-solid concept—except for the five little flaws I could see immediately.

The first one that jumped out at me is this: That’s great that Microsoft and Adobe are teaming up to make this happen. But there are a lot of players in a video’s journey from camera to your eyeballs. Like, somebody shot it, then transferred it to their computer, edited it, posted it on Facebook. It then got cropped to fit TikTok and Instagram, it went viral, and then the New York Times posted it on its website.

If this Content Credentials business is going to work, the manifest would have to remain attached to that video through that entire journey, and remain attached every time someone re-encoded or re-formatted it for another website or app. You’d have to get the camera involved. The editing software. Facebook. TikTok, Instgram. The news site. The social-media site.

POGUE: 15:05:01 But doesn’t that mean that the New York Times and the Washington Post and Twitter and so on would all have to be on board with this?

RAO: 15:05:08 Absolutely. / we all have to come together to solve this. 15:07:24 / We’re talking to everybody.

Well, guess what? Getting every single company on board was exactly the strategy. Beginning in 2018, Adobe and Microsoft started approaching these companies one at a time.

RAO: 15:05:32 / like Qualcomm, who’s a chip maker who makes chips that go into smartphones. Arm, also a chip maker making chips that go into smartphones. And then we have the Washington Post. We have U.S.A. Today. We have the BBC. /

And Nikon, the Wall Street Journal, the BBC, Twitter—all told, 750 companies are involved so far, representing every conceivable piece of the pipeline from the camera to your retinas.

Although…You know who’s noticeably absent from the coalition? Facebook. Figures.

RAO: 15:13:31 So the newspapers and the media have been the most interested and excited about this. And I believe that’s because this is based on transparency. / What we wanna get out of the business of, is the governments or tech platforms being the arbiters of truth, making those decisions of saying, “You should believe this,” or, “You shouldn’t believe this,” rather than getting to that, “We’ll show you what happened. And then you decide.”

OK. But I do have a second concern about all of this. And that is that the people who are trying to manipulate you—aren’t going to use the Content Credentials thing! They’ll go right on creating fake videos and photos—that just won’t have the little Content Credentials button on ‘em. So how does this help us?

I asked Adobe’s Dana Rao.

POGUE: 15:53:13 And if that little button in the top right isn’t there, then what do I conclude?

RAO: 15:53:19 You would say, “I think this person may be trying to fool me.” (LAUGHTER) 15:17:09 / The bad actors, they’re not gonna use this tool. They’re gonna– / they’re gonna make up something. And they’re gonna doctor it. / But they’re not gonna f– show Content Credentials. They’re just gonna show you that finished product. And then you’re gonna have to say to yourself, /

15:09:46 Why didn’t they wanna show me their work? Why didn’t they wanna show me what was real, what edits they made? Because if they didn’t wanna show that to you, maybe you shouldn’t believe them.

POGUE: 12:44:21;00 / let’s say it’s five years from now and I’m scrolling through Facebook and here’s a picture of the president saying, “I like to murder baby animals,” and that icon isn’t there. So what should my reaction be?

HORVITZ: 12:44:52;07 I think immediate skepticism. And I want to be in that world.

In other words, there will be videos that don’t have the Content Credentials button—but at least you’ll know not to put your trust in ‘em. You won’t click like, or share, or forward, or get outraged, unless you see something that does have the button.

OK, so here’s the third obvious flaw:

POGUE: 13:02:57;04 If I’m the bad guy, why can’t I fake the manifest as well? Why can’t I put a fake icon on my fake video?

HORVITZ: 13:03:08;07 Well, the icon itself doesn’t come from the video. It comes from– a pipeline– / that you might say is secretly or cryptographically embedded in the video in an indelible way.

POGUE: 13:03:27;10 / can’t I fake that? / just like they do phishing websites that look just like the Bank of America?

HORVITZ: 13:03:43;07 We are very wary about very kinds of attacks of the whole model like that. / And we designed the technology so that is impossible. If we didn’t do that, we’d be in the same hot water.

Behind the scenes, Content Credentials involves a lot of complicated cryptographic shenanigans that would make your eyes glaze over if you majored in anything besides computer science.

But in a Microsoft video for developers, distinguished engineer Paul England explains it like this:

PAUL: The final thing we need to do is make sure that nobody else can make a manifest for the video that says it’s something other than it is. And the way we do this is with something called a digital signature.

A digital signature is very much like a handwritten signature, but sort of better. Handwritten signatures are meant to prove who wrote a particular document. The digital signature does exactly the same thing, but it’s based on a cryptographic key. And as long as the publisher keeps that key secret, so no one else can get at it, then we know no way that somebody else can forge a digital signature for a manifest.

As a handy bonus, the plan is to store the manifests on a public blockchain—a tamper-evident public database that can be infinitely duplicated and examined. It’s exactly the same trust mechanism that makes possible cryptocurrencies like Bitcoin—just being used in a clever new way.

OK, potential flaw number 4: That’s all great for major web sites. But what about us ordinary schmoes? We would like to be trusted, too…

POGUE: 15:09:04 / So let’s say that I’m not a journalist. I’m just somebody who pulls out my phone when there’s a fight on a plane, and I wanna post it to Facebook. I don’t use any Adobe software. / Am I able to be part of this chain?

RAO: 15:09:28 Absolutely. If you’re using a smartphone and– or a camera, one of the partners that are part of the content authenticity sh– ‘nitiative who– who have implemented this technology into their phone or camera, you’ll be able to select that button before you take that video to say, “I want this captured.”

15:09:46 And once you capture it, you can publish it.

Ooooooh-kay…. Well, I look forward to the new generations of phones that have Content Credentials built into every photo and video.

Well, what about objection #5—privacy?

RAO: 15:10:38 / this is an opt in solution. You choose whether or not you wanna capture this. If you said, “Hey, I don’t want anyone to know what I’m doing to this image,” that’s fine. On the other hand, if you’re taking– a really important, news-worthy picture, and you decide not to select it, you just have to understand that people may not believe that what you took was real.

POGUE: 15:11:06 / Will there be the opportunity for me to turn it on at the time I take the picture, and then change my mind before I release it into the wild?

RAO: 15:11:15 You will be able to delete the Content Credentials. So remember, Content Credentials is what we’re calling the metadata that gets associated with the image that goes along with it. And you can choose to say, “No, I changed my mind. I don’t wanna do it anymore.” And it won’t go travel with it. And people will just look at the image and they’ll just not know what happened to it.

I’ve been trying to avoid the nerd-out term “metadata,” but it’s gonna be hard to avoid in this discussion. Metadata refers to the invisible data that’s attached to every photo, video, and audio recording. Like the time and date you captured it, or where. That metadata tells your photos app how to sort your pictures chronologically or by location, for example.

And Content Credentials, as it turns out, are just a glorified new kind of metadata.

RAO: 15:12:14 What we’re adding, which is– is important, is we’re adding the edits that you made to the image. So you’re gonna be able to see, what did they do to that image? Did they change the lighting or did they take someone’s head and swap it for someone else’s head? We’re gonna know that. /

POGUE: 15:17:44 I mean, on one hand, / I think that it’s super ambitious to imagine that every citizen’s gonna learn to use this and every camera company and software company’s gonna adopt it. On the other hand, PDF! Adobe (LAUGHTER) invented PDF and now everybody in the world knows what it is and uses it. So what is your current expectation of its ultimate adoption?

RAO: 15:18:13 Yeah. We’re very optimistic. I mean, Adobe as– as you mentioned, we’re the world’s leading creative company, right? We have millions of users who use tools like Photoshop every single day./

15:18:30 So we have global reach. / But this is not an Adobe solution only. /

At this moment, Adobe apps like Photoshop and Behance already have Content Credentials features up and running.

I realize that you’re not seeing that little lowercase-i-in-a-circle on social media yet; as 2022 wraps up, all of this is still in beta testing. But Dana Rao says we won’t have to wait long.

RAO: 15:24:18 / the very first step was getting this into Photoshop and getting working code into Photoshop. So we’ve passed that milestone. / 15:24:41 So I think next year’s a big year /.

I sure hope so. Because, as Microsoft’s Eric Horvitz points out…

HORVIZ: 12:43:47;20 / within five or ten years, if we don’t have this technology, most of what people will be seeing or quite a lot of it will be synthetic. We won’t be able to tell the difference.

I should also stick the silver-bullet disclaimer in here. Content Credentials could be a huge step toward shutting down those misinformation slimeballs. But the experts don’t think technology will solve the deepfakes problem in one shot.

HORVITZ: 12:45:13;22 / it’s not gonna be one answer. Let me just say that we have to do many things. We have to think through media education, literacy, skepticism– understanding technologies like this / technology we’re talking about. /

In other words, we have to start upgrading our baloney detectors.

HORVITZ: 12:56:02;24 It’s also gonna be / government. / I served on a committee called the National Security Commission on AI where we actually talked about this. But now there’s a bill in c– in Congress– bipartisan bill– introduced by Senators Peters and Portman called the Deepfake and Provenance Act.

12:56:27;00 And so this– this– a bill– it will call for a task force to study, “Well, what’s the government’s approach to this?” /

But in the end, Dana Rao thinks they might actually pull off something that people once thought was impossible: ending the deepfake reign of terror.

RAO: 15:18:45 / we really feel the momentum is– is forward and– and confident that this will work. Because it’s the right solution. It’s the solution we need. And there’s no turning back. /

Microsoft’s Eric Horvitz, too:

POGUE: 13:10:17;15 Wow. So this thing could work?

HORVITZ: 13:10:20;04 I think it has a chance of making a dent. Potentially a big dent in / this challenge of our time.

Well…here’s to Content Credentials, big tech doing the right thing…and making big dents in big problems.

UNSUNG SCIENCE with David Pogue is presented by Simon & Schuster and CBS News, and produced by PRX Productions.

Executive Producers for Simon & Schuster are Richard Rhorer and Chris Lynch.

The PRX production team is Jocelyn Gonzales, Morgan Flannery, Pedro Rafael Rosado and Morgan Church

Jesi Nelson composed the Unsung Science theme music, and our fact checker is Kristina Rebelo. Special thanks to Olivia(?)

For more on the show, visit unsungscience.com. Go to my website at David Pogue.com or follow me: @Pogue on your social media platform of choice. Be sure to like and subscribe to Unsung Science wherever you get your podcasts.

Podcast: Play in new window | Download