How Apple and Microsoft Built the Seeing-Eye Phone

Season 1 • Episode 7

Your smartphone can see, hear, and speak—even if you can’t. So it occurred to the engineers at Apple and Microsoft: Can the phone be a talking companion for anyone with low vision, describing what it’s seeing in the world around you?

Today, it can. Thanks to some heavy doses of machine learning and augmented reality, these companies’ apps can identify things, scenes, money, colors, text, and even people (“30-year-old man with brown hair, smiling, holding a laptop—probably Stuart”)—and then speak, in words, what’s in front of you, in a photo or in the real world. In this episode, the creators of these astonishing features reveal how they turned the smartphone into a professional personal describer—and why they care so deeply about making it all work. 

Guests: Satya Nadella, Microsoft CEO. Saqib Shaikh, project lead for Microsoft’s Seeing AI app. Jenny Lay-Flurrie, Chief Accessibility Officer, Microsoft. Ryan Dour, accessibility engineer, Apple. Chris Fleizach, Mobile Accessibility Engineering Lead, Apple. Sarah Herrlinger, Senior Director of Global Accessibility, Apple.

Episode transcript

Intro

Your smartphone can see, hear, and speak—even if you can’t. So the accessibility engineers at Apple and Microsoft wondered: Could the smartphone ever be smart enough to serve as a talking camera for people who are blind or have low vision? Could it describe what it’s seeing in the world around you, or photos in front of you?

App: A group of people sitting around a table playing a board game. 

Jenny:  Seeing AI is one of the most incredible revolutionary products I think we’ve ever put out there. /I get emotional when I think about what employees have created.

Today, the origin stories of two amazing accessibility features from Microsoft and Apple. I’m David Pogue, and this is “Unsung Science.”

Season 1, Episode 7: How Apple and Microsoft Built the Seeing-Eye Phone. We’re releasing this episode on December 3, the International Day of Persons with Disabilities, which the United Nations created in 1992. And the reason that’s so appropriate will become obvious within the next 90 seconds. 

About eight years ago, I was hired to host a panel at a corporate event. And backstage, I spotted one of the other panelists waiting. She was using her iPhone in a way I’d never seen before. 

App: Voiceover fast

Her screen was off—it was just black—and she was sliding her finger around. She was using VoiceOver, Apple’s screen-reading feature for blind people like her. It speaks the name of every icon, button, list item, and text bubble beneath your finger. Over time, she’d gotten so good at it that she’d cranked the speaking rate up. It was so fast that I couldn’t even understand it.

App: Voiceover fast

She gave me a little demo—and it occurred to me that her phone’s battery lasts twice as long as mine, because her screen never lights up—and of course her privacy is complete, since nobody can see anything on her screen. She usually uses earbuds.

But VoiceOver was just the iceberg tip. Turns out Apple has essentially written an entire shadow operating system for iPhones, designed for people with differences in hearing, seeing, muscle control, and so on. The broad name for these features is accessibility. 

Now, in popular opinion these days, the big tech companies are usually cast as the bad guys. But Apple and Microsoft have entire design and engineering departments that exist solely to make computers and phones usable by people with disabilities. And they’re totally unsung.

Sarah Our first Office of Disability was actually started in 1985, which was five years before the Americans with Disabilities Act came to pass. /

Sarah Herrlinger is Apple’s Senior Director of Global Accessibility. She describes her job like this:

Sarah My job is to look at accessibility at the 30,000-foot level, and make sure that any way that Apple presents itself to the world, that we’re treating people with disabilities with dignity and respect. 

David What is the business case to be made for designing features that are, at their, at their core, intended for a subset of your audience? 

Sarah You can look at public statistics that tell us that 15 percent of the world’s population has some type of disability. 

— that number grows exponentially over the age of 65.  Whether you think so or not, you’re probably going to be turning on some of these features. 

But for us, we don’t look at this as a traditional business case, and don’t focus on ROI around it. You know, we believe it’s just good business to treat all of our customers with dignity and respect, and that includes building technology that just works no matter what your needs might be. 

I think you might be surprised how far these companies go. Like on the Mac: they’ve added this entire spoken interface, so that you can move the mouse, click, drag, double-click, Shift-click, open menus, type, edit text—all with your voice. Same thing on the iPhone and iPad. Very handy if you can’t use your hands—whether they don’t work, or just because they’re full of groceries. Or greasy. 

Here’s the Apple ad that introduced this Voice Control feature. It’s basically a demo by outdoor enthusiast Ian Mackay, who’s paralyzed from the neck down. Here’s him, opening the Photos app and choosing a photo to send as a text message to his buddy: 

APP: Voice control audio

IAN: Wake up. Open Photos. Scroll up. Show numbers. 13. Click Share. Tim. Next field. Let’s ride this one today. Thumbs-up emoji. Click Send.

Did you hear that “Show numbers” business? That’s how you tell the machine to click something on the screen that’s unlabeled—like the thumbnails of your photos. When you say “show numbers,” little blue number tags appear on every single clickable thing on the screen, and you just say the one you want to click.

IAN: Show numbers. 13.

But I mean, there are also features for people who have trouble hearing, like your phone’s LED flash can blink when you have a notification. There’s another feature that lets you use your iPhone as a remote microphone—you set it on the table in front of whoever’s talking, and it transmits their voice directly into your AirPods or hearing aids.

And there’s a feature that pops up a notification whenever the phone hears a sound in the background that should probably get your attention, like the doorbell, a baby crying, a dog or a cat, a siren, or water running somewhere. Here’s my cat Wilbur and me testing out this Sound Detection feature:

SOUND: Wilbur

And sure enough! My phone dings and says, “A sound has been recognized that may be a cat.”

If you’re in a wheelchair, the Apple Watch can track your workouts. If you’re color-blind, like me, there’s a mode that adjusts colors so you can distinguish them. If you’re paralyzed, they’ve got features to let you operate the Mac with a head switch, blink switch, joystick, or straw puffer. They’re even trying to make the Apple Watch useful if you can’t tap it. Here’s Sarah Herrlinger:

Sarah We brought Assistive Touch to Apple Watch as a way for individuals who have upper body limb differences. So someone who might be an amputee, or have another type of limb difference, to navigate and to use the device — without ever having to touch the screen itself. 

I tried it out. Each time you tap two of your fingers together, on the arm that’s wearing the watch, the next element on the watch’s screen lights up. Tap, tap, tap—and when you get to what you want to open or click, you make a quick fist to click it. 

And it’s not just Apple. A few years ago, for a story on “CBS Sunday Morning,” I interviewed Microsoft CEO Satya Nadella. And at one point, he described how his life changed the day his son Zane was born—with quadriplegia. 

NADELLA: 12:46:49:15 Even a few hours before Zain was born, if somebody had asked me– “What are the things that you are thinking about?”  I would have been mostly thinking about– “How will our weekends change?” And about childcare and what have you. 

And so obviously after he was born, our life drastically changed. To be able to see the world through his eyes and then recognize my responsibility towards him, that I think has shaped a lot of who I am today. 

POGUE: But how does something as emotional/ as that empathy that– that you’ve developed translated into something as nuts-and-bolts-y and number-crunch-y as running a huge corporation? 

NADELLA: There’s no way you can motivate anyone if you can’t see– the world through their eyes. There’s no way– you can get people– to bring their  A game  if you can’t create an environment in which they can contribute. But the creation of that environment requires you to be in touch with what are they seeking? What motivates them? What drives them? So as a leader or as a product creator, can draw a lot from I would say this sense of empathy. 

So it’s no coincidence that later in our shoot, the one new Microsoft software product the CEO was most eager to show our cameras was something called Seeing AI. This was, by the way, a Microsoft app that runs only on the iPhone. From Apple

Here’s a clip from that “Sunday Morning” story. Nadella and I are in a company snack bar with Microsoft engineer Angela Mills, who’s legally blind. She showed me how Seeing AI works. 

SOUND: Angela demo

MILLS:   So if I now hold it up…

DP (VO): It helps her read text…

PHONE: “Carob Malted Milk Balls.”

DP (VO): …recognize objects…

PHONE: Banana. Orange.

DP: That is oranges!

MILLS: Yup! And then…Take picture.

DP (VO): …and even identify faces.

PHONE: 49 year old man with brown hair looking happy. 

POGUE: Wow, it left out tall and handsome. But that’s pretty close! (laughter)

So today, you’re going to hear two stories. Backstories of how two very cool accessibility features came to be. By what’s probably no coincidence at all, both features were invented by disabled employees at these corresponding companies—Apple and Microsoft.

Incidentally, this is the first episode of “Unsung Science” that involves commercial consumer corporations. There are a couple of others later this season. I just want to make clear that these companies did not, do not, and can not pay to be featured as a topic on this show; it wasn’t even their idea. In fact, I hounded them for interviews, they didn’t hound me. Sometimes, for-profit corporations make cool scientific or technical breakthroughs, too.

[pause/music]

OK. So our first story is that Seeing AI app from Microsoft. What a great name, too, right? Like seeing eye dog, but seeing AI, for artificial intelligence?

Anyway, the app has ten icons across the bottom; you tap one to tell it what kind of thing you want it to recognize. It can be text…

APP: Caution: Non-Potable Water.

Barcodes …

APP: Ronzoni Gluten-Free Spaghetti.

People…

APP: 30-year-old woman with red hair looking angry.

Currency…

APP: 10 U.S. dollars. One U.S. dollar.

Scenes…

APP: A bus that is parked on the side of the road.

Colors…

App: Gray and white. Red. 

Or handwriting.

App: Sorry Dad—I ate the last of your birthday cake.

There’s even a mode that uses low or high-pitched notes to tell you how dark or bright it is where you are, or as you move from room to room.

App: (Warbles low and high)

The app can also tell you what’s in a photo.

App:   One face. A woman kissing a llama.

The subject of that picture isn’t just any woman kissing a llama—it’s my wife, Nicki. She’s always had a thing for llamas.

Anyway. The man behind the app is Saqib Shaikh (Sokkibb Shake), who Microsoft introduced in a YouTube video like this:

SAQIB: I’m Saqib Shaikh. I lost my sight when I was seven. I joined Microsoft ten years ago as an engineer. And one of the things I’ve always dreamt of since I was at university was this idea of something that could tell you at any moment what’s going on around you.

In 2014, he got his shot. Microsoft held its first-ever, company-wide hackathon—a programming contest to see which engineering team could come up with the coolest new app in one mostly sleepless week. And Saqib’s app won.

Saqib: I gotta say, it was very basic, very early.

It could read text, and do a bit of face recognition to help you identify your friends while you’re walking past, and a few other things like recognizing colors and so forth. But  describing images— that didn’t come to a year or more later. 

And, you know, it got some attention, but it wasn’t really till the next year we thought, “let’s do it again.” And over time, more and more people got involved. 

In 2015, a more mature version of Seeing AI won Microsoft’s hackathon again.

Saqib So when we won the second hackathon, the company wide, I told my manager, look, we have this opportunity. And he was just like, “OK, I am just going to give you two months to see what you could do full time.”

But then it never stopped being my project. And before we knew it, we were on stage with the CEO at the Build conference in 2016, which was a pivotal moment and just so incredible. 

That CEO was, of course, Satya Nadella, who brought Saqib onto the stage. 

NADELLA: It’s such a privilege to share the stage with Saqib. You know, Saqib took his passion, his empathy, and he’s gonna change the world.

[pause/music] 

Seeing AI relies on a form of artificial intelligence called machine learning.

Saqib So the way machine learning works is, you have these algorithms called neural networks, which are—you’re not giving them steps like “recognize this, do this, look for this type of color or this line.” Instead, you’re taking many, many examples, hundreds of thousands of examples, of different photos. And then someone is teaching the computer by describing it. And that could be writing a sentence about it, such as “this is a living room with such and such.” 

You’re teaching the system by giving it, “This is the real answer.” And then this so-called neural network will learn over many, many iterations that this is the concept that makes this thing over here a couch. And this is the thing that makes this other thing over here a car. 

David I see. So  does the machine learning write code, or is it just a black box and you’ll never really know why it thinks this banana is a banana? 

Saqib In many ways, it is a black box where the system has learned the association between this banana and the word banana. 

Saqib even mentioned a feature I’d missed completely: feeling your photos.

Saqib You can kind of feel the little clicks around the edges of objects and you hear that, “wow, there’s a car in the bottom left and there’s a house over on the right” and “oh, there’s my friend over there,” and you’re kind of tracing a photo with your finger on a flat piece of glass. 

And sure enough: In the Scene mode, if you tap the Explore button, you can run your finger over the photo on the screen and feel little vibrations—with accompanying sounds—as your fingertip bumps into objects in the picture.

App: Move your finger over the screen to explore. Couch. Table. Chair 1. Chair 2. Monitor. Desk. Keyboard instrument.

Sometimes, Seeing AI is incredibly, freakishly specific in its descriptions. I tried to baffle it with a photo of me standing in front of this giant wall of stacked shipping containers at a seaport. I figured it’d have no chance of figuring out what they were. But here’s what it said:

App: A man standing in front of a row of shipping containers.

Wow.

OK, something harder then. My daughter Tia has this hobby, where she carves incredibly lifelike images into pumpkins. Like, four-level grayscale photographic-looking carvings. One year, she carved actress Shailene Woodley’s face into a pumpkin. And in my phone’s photo roll, there’s a shot of her pumpkin next to the Shailene Woodley picture that Tia used as a model. And here’s what Seeing AI said: 

App: Two faces—probably a collage of Shailene Woodley.

I mean, what the ever-loving—how did it know that? It knew that the woman on the pumpkin was Shailene Woodley? And that there were two Shailene Woodleys—one in a photo, and one on a squash

On the other hand, I should do some expectation setting here: The app is not flawless, and it’s not even complete—several of the recognition features are still labeled Preview. It sometimes gets easy ones wrong:

App: Probably a white horse with a long mane.

Uh, nope—it’s very clearly a llama. And especially in barcode mode, the app often just gives you a shrug:

App: Processing. Not recognized.

Finally, it’s worth mentioning that the most impressive descriptions arrive only if you let the app send a photo to Microsoft’s servers for analysis.

App: Processing. (musical notes) Probably a cat standing on a table.

Get down from there, Wilbur!

Still, Seeing AI has gotten better and better since Microsoft released it in 2017. Saqib says he uses it almost every day.

Saqib It can range from just reading which envelope is for me in the mail. Versus for my wife. In a hotel room to see, OK, all these little toiletry bottles, which one’s going to be the shower gel and the shampoo. You don’t want to get that the wrong way round! 

A lot of these things then puts me in the driving seat instead of having to ask someone for help. 

David Is there anything technologically that’s holding you back from making the app the dream app? 

Saqib I would love to be able to have AI that understands time, not only that this is a photo, but what’s going on over time, like, “the man’s walking down the corridor and just picked this up.”  And there are scientists around the world solving each one of these small problems. 

[Pause]

Jenny:  Seeing AI…Seeing AI is one of the most incredible revolutionary products I think we’ve ever put out there. I get emotional when I think about what employees have created. 

Jenny Lay-Flurrie is Microsoft’s chief accessibility officer. She’s deaf, so we chatted over Zoom with the help of her sign-language interpreter Belinda, which I thought was very cool. I started by asking her the same devil’s-advocate question I’d asked Apple’s accessibility head.

David Microsoft is  very proud of its work in accessibility.  From a business standpoint:  you’re doing a lot of expense and effort for a subset of potential customers. 

Jenny:  Well, I disagree, of course.  A billion people is not a subset.  Disability is an enormous part of —of community. It’s part of being human. 

David I mean, you say there’s a billion people who are helped by these technologies. Do you think the people who could use these features and these apps …know about these features and these apps? 

Jenny No, I don’t think that enough people know about what is available today with modern accessibility digital tools. So that comes back to, how do we educate and get it into people’s hands?

David Well, you could go on big podcasts and talk about it, that would help. 

Jenny Game on! Let’s do that!

After the ad break, I’ll tell you the story of a cool Apple accessibility feature I’ll bet you didn’t know existed. Meanwhile, let’s take a moment to acknowledge: Man, Microsoft and Apple both allowing their chief accessibility people to join in on the same podcast? I thought these companies are, like, arch-rivals?

But Jenny Lay-Flurrie surprised me. In accessibility, they have a kind of truce.

Jenny I would say that actually accessibility is one part of the tech industry where we don’t compete.  We collaborate.  On any given day, I’m chatting with my peers in all of the companies.  This is bigger than us. This isn’t about one company versus another. 

Inclusion is not where it needs to be, and technology is one powerful means to help address that. So, yeah, I would say this is an industry wide, and the maturity over the last five years has been incredible. Incredible! And it makes me bluntly, just stupidly excited about where we’ll be five years from now. 

I’ll be back after the ads.

Break – 

Before the break, you heard about Seeing AI, a Microsoft app for the iPhone for blind people that’s designed to describe the world around you. Apple has something like that, too. It’s an option that’s available when you turn on the VoiceOver screen reader. 

Once again, you could quibble with some of the descriptions, like in this photo of me in front of a fireplace:

Voice:              A photo containing an adult in clothing. 

Well, that narrows it down! Or how about this one:

Voice:              A person wearing a red dress and posing for a photo on a wooden bridge. 

Well, that’s all true—and the red dress part is really impressive. But I’d say the main thing in this photo is that she’s in a cave, surrounded by stunning white stalactites. And I know VoiceOver knows what those are, because in the very next picture, it says:

Voice:              A group of stalactites hanging from the ceiling of a cave.

But most of the time, VoiceOver scores with just unbelievable precision and descriptiveness. Listen to these:

Voice:              A person wearing sunglasses and sitting at a table in front of a Christmas tree. Two adults and a child posing for a photo in a wagon with pumpkins. A group of people standing near llamas in a fenced area. Maybe Nicki.

OK, what!? It not only got the llamas, but it identified my wife!? Well, I guess I know how it did that. The Photos app learns who’s in your photos, if you tell it—so this feature is obviously tapping into that feature. But it really got me there.

Oh—and VoiceOver Recognition even describes what’s in video clips.

Voice: Video: A person holding a guitar and sitting on a couch.

Chris [00:04:36] / I totally agree that these machines are now doing things that are ridiculous. 

Chris Fleizach is in charge of the team at Apple that makes all the accessibility features for iPhone, Apple TV, Apple Watch, and so on. Including that photo-describing business, whose official name is VoiceOver Recognition.

Chris (continuing):  This is a combination of machine learning, the vision processing, and to string together a full sentence description.  And so we can grab this screenshot, feed it through this machine learning, vision-based algorithm, pop out a full sentence in under a quarter of a second—and do it all on device. 

By “On device,” he means that all of this happens right on the phone in your hand. No internet needed. No sending images off to some computer in the cloud for processing, which is good for both privacy and speed.

Chris: Before, you could have done that, but you’d have to send it off to a server to be processed in some data farm and it would take 10 seconds to come back.  

Getting to this point didn’t happen overnight, of course. 

Chris: the image descriptions, that’s something that we started working on years and years ago when I essentially saw a prototype that someone else was working on at Apple. And they said, “Well, look at this cool thing. I can take this photo and turn it into a sentence, and it’s sort of OK.” And I said, “we need that. We need that now. How do we make that happen?”

And, you know, four years later, with the involvement of 25 different people across Apple, we finally have this on-device image description capability. 

[pause]

But we are gathered here today to hear the origin story of a different accessibility feature. Not object recognition, but people recognition. Actually, not even that. People distance recognition. 

David How do you pronounce your name? 

Ryan Dour. 

David Okay. Like the word. 

Ryan Yes. Well, I frequently hear people say “dower,” but I’m definitely not a dour person. I’m a doer. 

David Oh, that’s very good. 

So this is Ryan Dour, D-O-U-R.

David So what is your actual job at Apple? I mean, you’re not — you’re not Idea Man. 

Ryan Well, no.  I work on the software engineering Accessibility Quality Assurance team. So my job at Apple is to test and qualify all of our accessibility features, and then to make sure that those features work with many of our products. 

Ryan says that, conveniently enough, Apple’s various accessibility features for low vision have evolved at just about the right speed to keep up with his own deteriorating vision.

Ryan When I was a kid,  I had some vision, and over my lifetime, as my vision went from low vision to, well, quite frankly, no vision, there’s — Apple’s always been sort of at that forefront of technology, such that it actually followed my progress. You know, I went from actually using, you know, Close View and Zoom to, “Oh, okay, my vision’s getting to the point where I can’t really use Zoom effectively anymore. Oh, but—but here comes Spoken User Interface Public Preview right on time.” 

Now, when you can’t see, it takes some ingenuity to navigate the world—and if you haven’t been there, some of the stickiest situations may not occur to you. And for Ryan, one of the most awkward social moments is standing in lines. How do you know when it’s time to shuffle forward, even if you’ve got your cane?

[Ambi]

Ryan So imagine at a theme park, we’re in line at a theme park. The person in front of me is moved up, but I don’t want to constantly be tapping them, and there’s tons of voices and lots of chatter going around, so I don’t necessarily hear that they have specifically moved up. The person behind me now is waiting. And they’re not noticing. When they finally do notice, they’re getting annoyed. I feel like a rubber band. I’m bouncing between the person in front of me, tapping them, stopping, and then the person behind me is — if, if I don’t move up quickly enough, saying, you know, “go ahead, move up.” And imagine doing that for an hour while you’re waiting for a rollercoaster. 

With a dog, by the way,  you’re relying on your dog to wait in line. But their goal, as trained, is to actually get you around objects. So very frequently, if there’s a space, the dog will say, “OK,  it’s time to move up and around the people in front of you.” And that, that can become an issue as well.

OK. With that background, you’ll now be able to understand the significance of the meeting Ryan attended at Apple one day in the summer of 2019. 

That would be 2019, PC—pre-Covid.

Ryan We had this meeting with our video team and they were introducing us to a new technology. We all know it now as Lidar.

Lidar stands for ”light detection and ranging.” 

[music]

A lidar lens shoots out very weak laser beams—you can’t see ‘em, and they can’t hurt your eyes—and measures the infinitesimal time it takes for them to bounce off of nearby objects and return to the lens, like a bat using echolocation. That way, lidar can build a 3D understanding of the shapes and distances of things around it. It’s the same idea as radar, except using light to measure distance instead of radio waves.

They put lidar on planes, aimed at the ground, to measure crop yields, plant species, or archaeological features. They’ve used it to map the surface of the moon and the sea floor. A lot of self-driving car features rely on lidar—and so do speed guns that the police use to give you speeding tickets.

But in 2020, Apple built lidar into the back of its iPhone 12 Pro. The iPhone 13 Pro has it, too. The big idea was that this lidar would permit software companies to create really cool augmented-reality apps—you know, where the screen shows realistic 3-D objects in the room with you that aren’t really there, like an elephant or a sports car, even as you move around or change angles. 

OK, back to Ryan Dour’s meeting at Apple. You’ll hear him refer to haptics—that’s the little clicky vibrations that smartphones make.

Ryan: They had this demo app that would provide haptics based on how far away the person was. Or other things—like, it wasn’t just people at the time, it was just simply the output of the Lidar sensor itself, provided in a haptic way. 

And we all sat around, a bunch of us from the accessibility team and video team. And we thought about “what are some really great things we could do with this?” And a lot of ideas were sort of thrown around in the— in the meeting. 

  But towards the very end of it, I remember we had this lightbulb moment where I said, “Hey, wait a minute. Let’s go out into the hall.” 

And so we went out into the hall at Apple Park, and I said, “OK”–I had an engineer stand in front of me and I said, “Let’s pretend we’re at a coffee shop and you’re in line in front of me.  Whenever you want to, at random, I want you to just go ahead and walk forward.” And I held out this—this app with this haptic prototype, and I could feel like bup bup bup bup bup boop boop… and it’s like, “Oh, OK, that person moved up.”  I said, “You know what? This has some serious practical uses. I think that this is going to be something we should really consider in the future.” And that was sort of the end of that meeting. 

It sounded like a cool enough idea—just not cool enough to act on immediately. The idea was filed away. 

But then, in March 2020, Ryan’s idea got a big, hard push from a source that nobody saw coming: COVID-19. I think we can admit that for a lot of people, the pandemic wasn’t a great time. But it was even worse for people without sight.

Ryan: I was feeling a lot of apprehension in places that I’d never felt apprehension before! Like a grocery store or, you know, my local coffee shop. Just wondering, “where can I stand in the room where I’m not in somebody else’s bubble and nobody’s going to be in my bubble?” 

I was definitely not looking forward to potentially catching this, mostly because I thought, I don’t want to lose my taste. I already can’t see; I—I don’t want to lose my taste and smell.  And so, I was incredibly careful. I would say, maybe more cautious than others about trying to keep my distance, and, and also really being concerned that I didn’t want to be that vector that brings that to somebody else, either.  

And it was like, “Okay, you know what? That, that, that feature that we’d been talking about doing — why don’t we do it now? Why don’t we do it, why don’t we do it right now?” 

In other words, this idea of using Lidar to detect how far away people were now had two purposes. It could help blind people know when it’s time to move ahead in a line—and it could help anybody know if they’re observing six feet of social distancing. 

But Ryan’s first experiment with a prototype of the app, back in 2020, wasn’t exactly a triumph.

Ryan So we built this prototype and I took it out onto the streets of San Francisco. And the first thing that we encountered was, “Oh my gosh, this is a cacophony of feedback!” It was detecting poles. It was detecting garbage cans. It was detecting dogs and people in cars and all sorts of things. 

I’m hearing, you know, “eight three five eight nine six three five.” I’m like, Whoa, okay, what’s going on here? And then with the sounds feedback, it was just all over the place. You know, the beeps were close, the beeps or further apart, and it was just — I didn’t even know what object was being detected at that point.

And at that point, it wasn’t actually a useful tool.  Everything was kind of setting me off in this really weird walking pattern of — no different than if my cane had been hitting a bunch of objects that make me stop and think for a moment. 

David Right. 

Ryan And so, we started to consider, what are the other technologies we can use here? 

The answer was to rope in the Apple engineers who worked on augmented reality. 

Ryan: ARKit, which is our augmented reality software development framework for, for developers  has a feature, People Occlusion. So you may have used an augmented reality app where a body part like an arm or even another person gets in the way of the view, right? That the objects that you’re virtually viewing in your augmented reality game or, you know, whatever the environment is — when people get into view, they, they block it off. That’s actually incredibly useful. It’s amazingly powerful, and that is part of the machine learning process. 

Here’s an example of People Occlusion. There’s a super cool, free augmented-reality app called JigSpace, that lets you place your choice of all kinds of 3-D objects right in front of you, as viewed on your phone’s screen: a life-sized printing press, or lunar lander, or combine harvester, or whatever. And right there in your living room, you can walk around this thing, come up close to it, and so on. 

Well, suppose I’ve got this app, and now there’s a huge coral reef in my living room. Hey—it could happen.

Sound: Coral

Now, if somebody walks in front of that reef, close to your phone, their body looks like it’s passing in front of the coral wall. But if they walk far enough away from you that they should be behind the reef, the reef blocks them.

Turns out that’s just the sort of intelligence Ryan’s app needed to distinguish people in the environment from random clutter—and to ignore his own body parts in the scene.

Ryan For example, if I’m using my cane and my hand on my cane comes into view from the bottom, OK, we can ignore that. / I don’t need to be detecting my feet or my shirt or a finger; I need to be detecting the person in front of me. And so we’ve been able to really fine tune this so that we can pick up just the people and not the dogs and the trashcans and the poles. And this results in a fantastic tool. 

By the way—there’s a good reason that no other phones have a feature like People Detection: It really needs the Lidar. Chris Fleizach’s team briefly explored reproducing the feature using only regular iPhone cameras to see the world. But regular cameras don’t see the world in 3-D, the way Lidar does.

Chris And so, yeah, hard decisions were made. We would have loved to have brought this to more devices, but it wouldn’t have been good enough. 

So—in the beginning, the phone thought that everything in the environment was a person. But in the end, the solution was combining the depth information from the Lidar with the screening-out abilities of the augmented-reality software kit. 

If you have an iPhone 12 or 13 Pro, you can try the People Detection feature yourself. It’s hiding in the Magnifier app. You open Magnifier, and tap the Settings sprocket. Tap People Detection. You can specify your social-distance threshold, like 6 feet, and also how you want the app to tell you when it detects people nearby. 

You have three options: Sounds, which plays faster boops as someone gets closer to you and then switches to a higher note when they’re within six feet;

App: Boops

…Speech, which speaks how far away the nearest person is, in feet or meters; 

App: Speech counter

…or Haptics, which uses little vibrational clicks that play in sync with the boops. You can turn ‘em on in any combination. Here’s what it sounds like in your earbuds if you have all three turned on—in this case, as I walked past somebody in the drugstore.

App: Seven. Six. Five. Six. Seven. Eight. 

Here’s Ryan again.

Ryan So, so one subtle thing you might notice is that if you’re not using headphones, instead of hearing the —the feedback left to right across your, your soundstage, it will also change in volume to indicate, you know, how centered the person is. 

APP: Stereo

Oh, that’s cool! So yeah. If you’ve got both earbuds in, you can hear where the person is, left to right, as they cross your path.

APP: Stereo

If you’re not wearing earbuds, the volume indicates if the person is centered in front of you.

APP: Boops

Ryan has it set up so that two quick taps on the back of the phone turn the detection on, so he can whip the phone out unobtrusively and, thanks to those haptic clicks, start sensing where people are around him.

Here’s how Ryan heard the world as he approached a food truck to pick up his order. You can also hear him talking to his guide dog.

Sound: Foodtruck

Ryan It took away a lot of the apprehension, especially in places like waiting in line at the grocery store to check out. And this was especially true before we had the vaccines. 

David Oh, yeah. 

Like everybody else, Apple’s accessibility team is starting to dream of a time when this pandemic is over. What will happen to social distancing then? 

Ryan: Right now, we may be using it for keeping track of where, you know, where people are, what’s my bubble? But in the future, you may want to have a different threshold for how far away are people. 

  So, for example, our sound feedback provides tones that indicate when they’re, when they’re playing very fast, the person’s very close.

 When they’re playing slower, the person’s further away. But as you cross over the threshold that you’ve set, which by default is six feet, it drops in pitch

APP: Fast-slow boops

Ryan: … and then the distance between the tones also increases until you really kind of get out of range — until that person is around 20 feet away and then they’re not detected. 

David Right. 

Ryan In the future, this is going to be useful for walking into that crowded coffee shop and looking for that quiet corner, or  getting on to a subway car and figuring out like, where’s the empty space I can go sit down?

[pause] 

As you’ve probably figured out, I love this stuff. Some of the cleverest, most magical work going on in Silicon Valley today is in these accessibility features—and so few people even know they exist! 

And you may be thinking, “Well, it doesn’t matter if I know about these features. I’m not disabled.” But the accessibility engineers I interviewed made one point over and over again: Almost every time they dream up a feature for the disabled community, it turns out to be useful for the wider public. 

Here’s Apple’s Sarah Herrlinger:

Sarah There are many people for whom they don’t self-identify as having a disability. And yet accessibility can be more about productivity for those individuals. If you go under the accessibility panel on any of our devices, take a little bit of time to investigate what’s there, I would say you’re probably going to find something that will make your life easier. 

Microsoft’s Saqib Shake:

Saqib This is a whole history of disability as a driver of innovation. And I could tell you a dozen stories.  The fact that our phones can talk to us, this was part of the invention of the first reading machine. 

Same with voice recognition, was for people with physical impairments. Even the touch screen, to some extent was invented by someone who had difficulty typing and wanted a lower impact way to type text messages for people who were deaf. 

Microsoft accessibility head Jenny Lay-Flurrie:

Jenny:  Captioning! Captioning came out of creating technology for the deaf. We all use captioning in different ways. We can be sitting on a train, or trying to sneak that video without the person next to you watching. 

Audiobooks is the same. Those were created as talking books for the blind. And now look at what’s happening with audiobooks. 

Saqib: [00:20:34] People kind of forget where the origin came from. And that’s a good thing because it just blends into the fabric of life. 

Jenny: By making something accessible, you don’t just make it inclusive to the cool people with disabilities. You actually give core capability that everyone can benefit from. 

I told Jenny the story of that conference eight years ago, where I saw someone using VoiceOver on the iPhone for the first time. And her response is epic.

David It She was using voiceover, to operate her iPhone with the screen off, I mean, completely off. 

Jenny Absolutely. The way that that individual who’s blind is using her phone actually is a much more efficient use of the phone than someone who’s sighted; the screen’s off, the battery power lasts longer. And I’ll tell you,if they’re using a screen reader, they’re listening to that sound at way above your normal speed. They can get through an audio book in half the time. I can sit in a room and understand what’s being said, not by hearing any audio, but I can watch what’s happening. I can understand to put the pieces together of what people are saying, and I don’t have to be within two feet of someone. I’m great at a party for that perspective. 

[music]

Disability’s a strength. 

We have strengths and we have expertise.

The one thing that we need to stop doing as a society is seeing disability with a sympathy lens.

So when you see a person with a disability, forget the “diverse abilities,”  “super ability,” “special ability.” No, say the word “disability.” We’re proud of our identities. We’re proud of who we are, and we’re experts in who we are. And just not with sympathy, but with empathy.