Drive Thru Automation
DJ: Welcome to AI Coffee Shop. May I take your order?
Jerry Cuomo: Hi. Yes, yes. Can I order a large cappuccino with half and half and two sugars? No, no, no. Two Splendas. Do you haves Splenda? Well actually, skip the sugar. Skip the sugar.
DJ: Okay. One large cappuccino.
Jerry Cuomo: Actually, scratch that. I'll just take something decaf. Maybe just an Americano a decaf Americano. Oh, sorry about the dogs. What did I order?
DJ: An Americano and a cappuccino?
Jerry Cuomo: No, just an Americano. I scratched the cappuccino, remember? And it's a decaf.
DJ: Ah, cappuccino scratched.
Jerry Cuomo: Oh, maybe I'll have one of those jumbo cookies too. Oh no, no, wait. Too much sugar. Forget that.
DJ: Okay, let me confirm. You're ordering a decaf Americano, and you're skipping the cookie because you're trying to cut down on sugar. And I have a biscuit here for your dog. Did I get that correct? Will there be anything else?
Jerry Cuomo: Wow. I mean, no, that's perfect. Thank you.
DJ: Great. Then drive up to the next window to pay and pick up your coffee.
Jerry Cuomo: Welcome to the Art of Automation, a podcast that explores the application of automation in the enterprise. Folks, that was DJ and I setting the scene for this cool episode on how AI and advanced automation technology can make sense of the ramblings and noise that you just experienced in our reenactment of, well, a typical drive- through ordering experience. Our guest today is Brian Langner. Brian is an expert in AI and natural language processing, and he knows how to make computers talk as well as listen. And with that, I'd like to welcome Brian to the Art of Automation. Welcome, Brian.
Brian Langner: Hi, Jerry. It's great to be here.
Jerry Cuomo: Let's get right to this first question because I have so many things I want to chat with you about. It's a great topic. So can you start by please sharing with our listeners why you love what you do?
Brian Langner: Well, it's interesting. I have a background doing speech and conversation technology, and I'm one of those lucky people who actually has a career in industry doing exactly what I did in my graduate studies.
Jerry Cuomo: Wow.
Brian Langner: I don't know how I managed to do that, but I'm pretty lucky.
Jerry Cuomo: That's fantastic.
Brian Langner: The reality is I've been interested in computers and AI since I was little, and this is a great way to explore that technology and do what I like. I make computers talk. That's kind of the one- liner pitch for what I do. I also like to joke. Sometimes, I make them listen and listening is actually way harder than making them talk.
Jerry Cuomo: So yes, that is the word on the street, Brian, that you make computers talk. So how do you automate a computer voice in a way that people like, perhaps even feel is friendly or even approachable?
Brian Langner: That's a great question. There's a lot of character that goes into people's voices and when you're trying to make a computer interact with a human by voice, you want to have some character with it. And so my background, both in my time in graduate school as well as an industry, I have built voices and one of the things that we've done is tried to provide a personality.
Jerry Cuomo: I see.
Brian Langner: You don't want to go too far down that path. You get into an uncanny valley really quickly about whether or not person understands they're talking to a computer or not. But you do want it to be a little bit personable with character, but still somewhat, obviously, this is not a natural human you're talking to.
Jerry Cuomo: I see. So you have to preserve some of the robotic aspects of it or else it gets creepy?
Brian Langner: It can get creepy really quickly, yeah. And I think the other thing that's been interesting in my past is if you have a very natural sounding voice and you're talking about a conversational system that interacts with a person, the person on the other side will assume that the computer on that side actually is better at doing this than it is.
Jerry Cuomo: I see.
Brian Langner: So you don't want it to be too natural. If it's too natural than people assume that there's a human on the other side and they can speak completely naturally and fluently and turns out, the machine is not always great at understanding all of that.
Jerry Cuomo: Hey Brian, I heard you were a part of a company called ToyTalk. That sounds really interesting. In fact, I think you were even one of the founders of it. So can you share with our listeners a little bit about that experience?
Brian Langner: Yeah, so I was one of the founding engineers at ToyTalk. That is the fancy way of saying that I agreed to join before we had any money. But yeah, it was a really fun experience. It was a group of folks, a lot of people who had come from Pixar, had backgrounds in character and technology and combining that to basically be a creative engineering organization.
Jerry Cuomo: Any toys that we would know, Mr. Potato Head or anything like that?
Brian Langner: So the one that everyone would know is Barbie. So we did build an internet- connected conversational Barbie doll, sold it...
Jerry Cuomo: Cool.
Brian Langner: ...in I want to say 2015 or 2016. I don't remember the year anymore. I actually have one in my house, although it no longer works. But yeah, you could have a conversation with Barbie.
Jerry Cuomo: So take us from ToyTalk to automating order- taking.
Brian Langner: On some level, they're not actually that different of a problem, at least from the technical perspective. You're trying to get a computerized agent of some kind to interact by voice with a person and have a productive interaction. When you're talking about something like a Barbie doll, that's for entertainment purposes. So it turns out you don't always have to be perfect.
Jerry Cuomo: Right.
Brian Langner: Because if you say a non sequitur, sometimes, kids actually think that's funny and that's better than doing it right in the first place.
Jerry Cuomo: Right.
Brian Langner: When you're talking about automating a drive- through, what you'll end up with there is yeah, you do need to get closer to perfect because someone's actually on the other side hungry, trying to get some food. The good thing that helps us in this particular application is that it's interactive. So if we don't quite understand something the first time around, we can ask a follow- up question and hopefully, we'll have a better chance the second time.
Jerry Cuomo: Hey, Brian, that makes a lot of sense. Now, can you share some of the challenges? I mean, when I'm monitoring a drive- through, I find myself rambling as I try to make up my mind. Oh, and my dogs are maybe barking in the backseat. And so how does AI, natural language processing, machine learning bring order to this noise and rambling?
Brian Langner: The first thing to notice is that the technology that my team works on, it is audio only. And so we're only using the signal we get from the microphone. When you start talking about challenges, that's one of them.
Jerry Cuomo: I see.
Brian Langner: People have a lot of visual cues in terms of how they respond to speech and in conversation and our system doesn't have access to that. I think the other thing that you'll stereotypically note is drive- through microphone and audio systems aren't stereotypically, I would say, the best.
Jerry Cuomo: Yeah, I would agree.
Brian Langner: The reality is that the human person whose job it is to take orders in the drive- through listens to something that sounds like for about eight hours a day. And they're supposed to turn that into," I'd like a double cheeseburger and a large Coke." Our system has to do that. We don't have any other signal to work with. And so we've needed to build speech technology that's robust to those kinds of conditions. The reality is also drive- throughs are outside, so you have outdoor noise in places that are not California. You have weather. It might be windy or raining or sometimes hail and the microphone is sitting in a giant metal box, so it's not the greatest of acoustic environments.
Jerry Cuomo: I see.
Brian Langner: The other thing is you have a car engine that's idling two feet away from the microphone. I like to joke that the solution here was we should just buy everyone who goes through a drive- through a Tesla and then we're fine. But it turns out that was more expensive, and companies didn't want to pay for that. So it's our job to actually make the microphone and the speech recognizer work in that challenging acoustic environment.
Jerry Cuomo: Right.
Brian Langner: I think the other thing you noticed is the thing you said. You tend to ramble a bit. People are familiar with what the drive- through experience is like. It's been around for 50, 60 years. You pull up. There's a giant sign with what there is to order. You start talking. You get your food out the other end. Well, what that means is very often, people will start talking before they know what they're going to say," So I'd like the number 5, 4 with a Coke. No, it should be a diet Coke today. I'm trying to lose weight. And yeah, large fries. Actually, you know what? Nevermind. Just get me the McNuggets."
Jerry Cuomo: I'm sorry, Brian. Just by you going through what you did, it seems like an impossible problem to solve. What's the accuracy?
Brian Langner: So our target for our technology is to be able to automate 75% or more of orders.
Jerry Cuomo: Wow.
Brian Langner: And where we say automate, we mean be able to take the order from start to finish without having to have one of the human employees come in and take over. Our technology today is approximately in that range, and we've made some substantial improvements over the past several months, and we expect that trend line to continue.
Jerry Cuomo: That's impressive, Brian. So now, where are the breakthroughs coming from? Is it around a different type of machine learning? Is it different type of natural language? Is it software, hardware, more training? All of the above? None of the above?
Brian Langner: I mean, it is kind of all of the above. As with most kinds of automated technology, in particular, speech and language technologies, there's a lot of different things that can go wrong. None of them, individually, are necessarily 10% of the problem, but in combination, the total mass of the things that we're not doing well is substantial. And so it's a matter of finding the lowest hanging fruit to fix, but also, how many of them are worth the effort. If it takes three months to fix a thing that has 0. 2% improvement, maybe that's not the best way to spend the resources and we can try and do some things that get 1 to 2%. And so that's kind of where our team has been for the past year and a half is working on problems like that. Yeah, I would say the biggest improvements we've had in the past several months have been around getting the speech recognizer to be more accurate and in particular, more robust to the kinds of environments that we actually see in the real world.
Jerry Cuomo: Amazing.
Brian Langner: There are drive- throughs that border a freight rail yard and so 1: 00 every day, there's a giant freight train that comes by and blows its horn for 10 seconds at 100 db. It turns out that the employee, the human employee there actually can't deal with that either.
Jerry Cuomo: Right, right.
Brian Langner: Because it's so loud that it just obliterates everything. And so those are the sorts of challenges we've been trying to work around. Our team has done an okay job so far about finding the biggest things that need improving and adding new data, adding new modeling techniques, and working on modern deep learning approaches to solve these sorts of problems.
Jerry Cuomo: Wonderful. So Brian, you're a leader today in a team called Watson Orders. Can you just tell us a little bit about how you got here?
Brian Langner: Well, that's a really interesting question. It's not where I thought I would be. I actually got involved with this group of people when it was a startup called Apprentae. So this is a Silicon Valley group, almost straight out of a TV show perhaps, where it's 15 to 20 people in a tiny little office that's overcrowded. It looks a little bit like a garage, almost. I come in and for my interview with this group and there's actually a guy taking a nap on the couch. It was really kind of straight out of a TV show. So it was a great group of people, really quirky, really talented, really good at what they do. And then the week after I interviewed, they announced they were being acquired by McDonald's. That McDonald's, yes. And so, that's not quite what I thought I was getting myself into. And it's why is McDonald's interested in an advanced technology team? Well, it turns out that they were interested in automating drive- through taking, order taking. That's actually a pretty reasonable application of speech and conversation technology. It's targeted domain. There's a goal involved. And so the people who are involved in this, they know what they want to do. There's an object for what they're trying to be there.
Jerry Cuomo: I see.
Brian Langner: They're not there just to play with the toy. They're there because they want food. It's a voice- based interaction already, so there's no real retraining of the customer base. They already know what they need to do. And then it's just a matter of can we get the computer to do some of the more challenging tasks? And that's what this team would do. And so we took a prototype that that Apprentae team had built. We turned it into a real software product over the course of two and a half years with McDonald's. And then, to make that next step, needed the backing and resources of a larger group and McDonald's understood that and that's how we ended up at IBM as part of the Watson Orders team.
Jerry Cuomo: Wonderful. Now, I have to kind of rib you a little bit. Are there any mean- spirited colleagues, maybe former colleagues from Carnegie Mellon, where I believe you graduated, that joke with you about playing with toys and working at the drive- through at McDonald's?
Brian Langner: I mean, honestly, no. All my colleagues and friends have been pretty good about that, but-
Jerry Cuomo: That's good because you have the coolest job on the plane.
Brian Langner: ...I actually have been the one kind of giving myself the ribbing on that one. So when I left my previous job to join the McDonald's Tech Labs team that became Watson Orders, I joked with them. Yeah, I quit my job and oh well, what are you doing now? Well, I work at McDonald's. McDonald's? Yeah, no, no, I take orders for the drive- through. And I got really good at being deadpan about that and it was a lot of fun. One of my friends actually, she came up to me after I said that and just put her hand on my arm and was like,"Are you okay?" Because they know I have an advanced degree. I've been working in technology for years. Why are you working at McDonald's? And then, of course, I laugh, explain, and then they'd roll their eyes and they're like," Oh, okay, I see what you're doing. You're pulling my leg."
Jerry Cuomo: Fun story, Brian. So Brian, what does your crystal ball tell you about the future here? What's the future state and perhaps, what other industries might benefit by applying what you've learned around drive- through ordering?
Brian Langner: I mean, speech and conversation technology is at a little bit of an inflection point right now. I think we've gotten enough advanced technology to the state where it works well in a variety of circumstances that we can start to talk about applying it to real world problems like this. I think things like Siri and Alexa are great in terms of pushing the field forward, but by themselves, they don't necessarily have a goal behind them.
Jerry Cuomo: I see.
Brian Langner: This sort of technology that we're building at Watson Orders is goal- directed, and I think that's one of the reasons why we've had a bunch of success with it because the user base is inclined to work with us to solve a problem together.
Jerry Cuomo: Perfect.
Brian Langner: I think applying that technology to similar sorts of things, yeah, we're going to see that increasing over the next several years, ideally, a lot of it being the Watson Orders technology that my team and folks are building, but I suspect there'll be lots of different entries into the space and I expect it to actually take off for certain kinds of applications.
Jerry Cuomo: It's amazing, Brian. So unfortunately, we're out of time. I can go on and on talking to you about this subject, and I want to thank you for joining us on the Art of Automation podcast.
Brian Langner: I can too.
Jerry Cuomo: Very cool topic. Thank you so much, Brian.
Brian Langner: Well, Jerry, thanks for having me. I've enjoyed it.
Jerry Cuomo: Well, that was fun. And that's it for today. I've attached some links to information on Brian's project, which is IBM Watson Orders, in the description section of this podcast. Oh, and if you enjoyed this podcast, it's also likely that you'll enjoy the Art of Automation book, which is now available. A link to the book is also in the description section, and royalties for the Art of Automation book are being donated to the American Cancer Society. Okay, once again, I'd like to thank Brian and, of course, I like to also thank you all for listening again. This is Jerry Cuomo, IBM fellow and VP for technology at IBM. See you again on an upcoming episode.
DESCRIPTION
Host Jerry Cuomo kicks off Season 3 with an exciting behind the scenes look at how AI & automation technology are transforming how orders are taken at drive thru restaurants. "I make computers speak" is the calling card of guest Brian Langner. Brian and Jerry discuss the challenges and rewards associated with conversational AI. Brian adds to his calling card, "however, the harder part is getting them to listen."
Brian, who currently leads AI Research and Projects for Watson Orders, highlights his journey from university studies in AI, to Silicon Valley startup ToyTalk, then being acquired by McDonald's, and now at IBM with their acquisition of McD Tech Labs.
Brian and the Watson Orders team are developing and testing Automated Order Taking (AOT) technology in restaurants showing substantial benefits to customers and the restaurant crew experience. Brian further explains how AI and natural language processing will help scale the AOT technology across markets and tackle integrations including additional languages, dialects and menu variations.
Connect with Jerry on LinkedIn here. And Jerry’s Digital Twin (DJ) here.
Also buy the Paperback of Art of Automation here (Royalties are donated to the American Cancer Society).