A speech-recognition software expert explains the difference between good design and ambiguity, how good designs go bad, and why everyone is a designer.
Blade Kotelly is the Creative Director of Interface Design for SpeechWorks International and author of the book, "The Art and Business of Speech Recognition."
UBIQUITY: Terry Winograd at Stanford has talked about your long experience as "a virtuoso voice interface designer"; let's start there, and ask you to tell just how you became a virtuoso.
KOTELLY: Well, it started when I realized that I couldn't really understand ambiguous situations or ambiguous signs. For example, when I was a kid and saw the road sign "Ped X-ing" it didn't make any sense to me, and I first thought "Ped X-ing ... Boy, what does that mean?" Aha! "Ped" means "pedestrian" and "X" means "cross". Okay, X for cross. So I could kind of get that. But then comes "X" meaning something else entirely, as in "X-Mas" for Christmas. These kinds of ambiguities became really intriguing to me. Little did I know that such ambiguities would lead me into a career trying to make sure things were as unambiguous as possible.
UBIQUITY: What was your college major?
KOTELLY: I was in the Human Factors (engineering-psychology) program at Tufts University. It was a great program because it didn't focus so much on computer-GUI as it did on things that people use -- from tooth brushes to vacuum cleaners and also to software of course. Then I wound up joining Wildfire Communications about 7-1/2 years ago, and doing usability work there. My time at Wildfire was like getting a master's degree working on one product -- and one project -- for a year and a half. After Wildfire I joined SpeechWorks 5-1/2 years ago to do design work and usability test my own designs.
UBIQUITY: Why don't you go ahead and say a few words here about Wildfire and how it works.
KOTELLY: Okay. The company called WildFire made a piece of software also called WildFire, and it was basically voice mail on steroids. You would speak the commands (such as "Throw it away" when you were done listening to a message). And what would happen is someone would call your Wildfire service, and then Wildfire would call your phone and announce who was calling so you could decide to either have Wildfire take a message for you or put the call through. You could return the call on a message easily because it announced who the call was from and would have already captured that person's phone number. After you heard the message you could say, "Give them a call," and it might ask, "at the number they left me?" So it's a great way of managing your messages and your contacts easily.
UBIQUITY: During the year and a half you were with Wildfire, did you see much evolution in the product?
KOTELLY: Absolutely. It went from a box you put into a company to a system that could be used in a telephone network. So of course things change. It went from being a little bit hard to learn how to use to being really easy to learn because when you're going to deploy it to hundreds of thousands of people you need to figure out a strategy for that. It was important to limit the functionality in certain ways but expand it out over time for users who need more. If you keep calling the same number often it could recognize that and say, "I know that you keep calling the same number often. If you'd like you can get the 'contacts' feature, which would allow you to call people by name." That was a great way of having people learn how to use this system without having too high a cognitive load when they first started using it.
UBIQUITY: After Wildfire you moved to your present position. Tell us about that.
KOTELLY: Yes, I joined SpeechWorks back in, my God, '97 -- it seems so long ago now -- and have been doing design work here. I've seen the team grow from just me -- as the first person hired to be a pure play designer -- to now having about 25 designers scattered around the world. In addition to the hard-core designers all of our programmers are incredibly design-sensitive.
UBIQUITY: Going back to Terry Winograd's characterization of you as a "virtuoso," are there many design virtuosos out there?
KOTELLY: I'd have to say there are very few. While it is somewhat straightforward to design a speech application it's even easier to design a bad one.
UBIQUITY: Let's talk about that. How are the non-virtuosos likely to screw up?
KOTELLY: The most common screw-ups are due simply to lack of observation and failure to be aware of ambiguity in language.
UBIQUITY: Ah, yes. Ped Xing and Merry Xmas.
KOTELLY: I saw a great sign yesterday as I walked to work. Evidently above me were WORKING, "DANGER MEN". It said DANGER MEN in very large letters, with the phrase "WORKING" written above it, very small. A better version would have been a large "DANGER" and smaller "Men Working". When you look at it, you can generally understand what they mean. But if this were a prompt in a speech-system, emphasizing the wrong word can cause usability problems. Not only is something like that difficult to catch when designing a prompt to convey the idea correctly, but it is also difficult from a sociological standpoint of "How do I convey these ideas so that people get them and so that the language doesn't stand in the way because it's either too colloquial or too formal?"
UBIQUITY: The prompts can be either colloquial or formal?
KOTELLY: Yes. Sometimes a formal phrase is very clear. After all, many grammar rules arose out of a need to be clear and to have some sort of consistency.
UBIQUITY: What other kinds of difficulties are involved for the non-virtuoso?
KOTELLY: Well, one confounding variable is the fact that people often take a reasonably well-written script and give it to a voice talent and hope for the best -- instead of directing the talent, they just hand it over the wall and the voice talents direct themselves -- as is frequent with many touchtone systems. Yet any kind of complex idea that's not directed can come across very differently than intended. We've all heard bad versions of Shakespeare. Good Shakespeare is easy to understand. Bad Shakespeare is really tough. As an example of what I mean, take the phrase, "Would you like the gate information for an international flight?" It can be recorded at least three ways. The first one being a simple question. The second one would be, "Would you like the GATE information for the international flights"? I'm punching the word "gate" to contrast some sort of other information that I might have available. Or I might say, "Would you like the gate information for the INTERNATIONAL flight?" -- i.e., the international rather than the domestic flight. With these short questions, I can convey lots of meaning without using extra words, but if the voice talent isn't directed appropriately, the meaning gets lost.
UBIQUITY: Whereas at the virtuoso level ... ?
KOTELLY: At that level you direct the voice talent to do all these things correctly. You make a beautiful script, and then you direct them to convey the right meaning. And of course there's one other important layer -- conveying sincerity.
UBIQUITY: How do you do that?
KOTELLY: Oh, by casting really good voice talents and by being sensitive and listening to how people really talk and how they convey ideas when they're trying to elicit a response. It just takes years and years of practice, practice. It was Lawrence Olivier who said something along the lines of, "Sincerity, sincerity. Once you can fake that you can achieve anything" or something like that. So here I need to have these voice talents pretend they're actually asking you a question.
UBIQUITY: And apparently it's not always easy to do this.
KOTELLY Right, it's not. For example, there's a Delta Sky Miles guy who says, "If you're a Delta Sky Miles member, Press or Say . . . ONE" in a very self-important sort of way, in a voice to make you think you were in the Coliseum, and he was asking if the crowd would like to send you to the lions. "Press or Say (BIG PAUSE) ONE." It doesn't sound like he's talking to YOU. The objective in a voice-prompt of course is to talk naturally, because you want to have a psychological buy-in from the caller. So this level of sincerity gives you a psychological connection with the person listening. There are some very interesting studies done by Cliff Nass and Byron Reeves about how people treat computers like real people. The prompts need to be done right so that the listener has more than just an intellectual reaction, more than just a "Do I understand it or not?" The listener needs to decide whether he wants to deal with this system or not: "Do I want to talk to this virtual person?"
UBIQUITY: By the way, did you ever solve the Ped Xing problem? Here's a similar problem. Fellow looking for a parking space sees a sign saying "Fifteen minute parking." But he decides, "I don't have 15 minutes." How could a circumstance like that be dealt with?
KOTELLY: That's excellent. If the system was well designed it should deal with it just fine. You know, there's another parking problem, "Space reserved for Infant Parking." I didn't know they let kids that age drive! It's at the Stop-&-Shop where I live. It's a great example.
UBIQUITY: You do a lot of speaking engagements with various audiences, right? What kinds of groups do you speak to?
KOTELLY: Everything -- from colleges like Tufts and Stanford and Harvard, to corporate audiences who want to be able to acquire the skills to do this themselves, to prospective clients of course, and to conferences on speech. I recently did one high-level techie conference that happens every two years, and my major point was that the thing that drives technology should be what we WANT to do with it, as well as what we CAN do with it, right? The design is what pushes what we need to do with technology.
UBIQUITY: Your design program at Tufts was very broad; how did you decide to focus on speech?
KOTELLY: A little bit of dumb luck helped, and a little bit of interest in phones. When I was 12 or 13 I went to a school where all my friends lived very far away because everyone commuted to school. So I got a phone line of my own at an early age. When you want to share a CD with your friends you play part of the song to them over the phone. But you soon learn that people can't hear most of the music over the phone, but of course you're a teenager and that doesn't stop you. Then you start getting a sense of what does and what doesn't get transmitted through the phone, and you ask yourself why wasn't that clearly understandable to my friend over the phone? And you start thinking about this in acoustical terms. Then a little later I began to compose music, which is very similar to what I do now because you have to design a structure of sound that will be produced by a real instrument then somehow reproduced over a big speaker, or over a small speaker, or over a phone. You have to be perceptive to the fact that someone else will be actively listening to your music under each of those circumstances. So whether it's a conversation or a piece of music, you have to ask yourself, will they get it? Is it conveying meaning?
UBIQUITY: Do you think much about the visual side of user interaction?
KOTELLY: All the time. The visual and audio are all the same. Design is design in whatever medium. So I look at everything, because you can learn something from each realm. For example, I look at logos to think about conservation of line form. How do you express something cleanly and simply? The original Macintosh graphic designs had very simple and clean outlines that conveyed the simplicity and approachability of the OS. I think that to be good in design, you have to think about design of everything. To be really good, you have to think about how everything works. Because when I'm trying to convey an idea over the phone to someone, I might have to paint a picture in their head. Make it simple, and clean, and professional, savvy, and interesting.
UBIQUITY: Do any of your audio designs interact with visual components?
KOTELLY: Well, yes and no. Right now there's work being done here at SpeechWorks on multi-modem interfaces, so that you could have a PDA that you're working with, providing driving directions to you both textually and visually. You might have an ear bud in which you hear directions, and a mic that allows you to give it commands, like "repeat that". When you need something more, you pull it out of your pocket and tap on it.
UBIQUITY: Who are some of the people who have influenced you?
KOTELLY: I would say that one of my biggest influences in design have been my college professor, John Kreifeldt, who invented the Reach toothbrush; he is truly amazing. Judith Wechsler, my art history teacher from whom I took one class, "History of Modern Art 1880-1914," which was the most influential class I took. When you understand the history and progression of art all the way from neo-classical design to modern and post-modern art, you'll also see why street signs look like street signs. It's amazing to understand in a modern context why everything looks the way it does.
UBIQUITY: Who were some of the other people who influenced you?
KOTELLY: There are so many big designers. I think Paul Rand, who did a ton of logo work, had a wonderful book out. His stuff is truly brilliant. He did the NEXT logo, Morningstar and The Limited logos. His work is inspiring, and helps you to understand the evolution of a design. You know, to see where something starts and where something goes. To understand what someone in that position thinks of when they first think of words like Morning Star. Or Coca-Cola. Coca-Cola has a great word profile, because even if it's written very very small you can still identify it as Coca-Cola and not as something else. You can do that in audio space as well, of course. That is to say, when things are very faint they can still have brand-recognition. So many designers influenced me. I'd say the Bauhaus Movement, Van der Rohe, Saarinen, all those mid-century designers. And then contemporary designers who do really clever interpretations. J. Mays -- currently from Ford -- who did the Volkswagen Beetle. And Jonathan Ive, the designer of the past several Apple computers and peripherals. Incredible. Of course, Jeff Raskin. The interaction of the Apple OS! The philosophy of Apple Computer is at once understandable through the manifestation of the design. It truly explains why everyone thinks they can be a designer -- that is to say, once they've seen the solution.
UBIQUITY: Who else?
KOTELLY: I'll forget someone important, but, Don Norman for one. And there's Henry Dreyfuss, who did all the work on the trains and designed the Honeywell round thermostat; his work, of course, is really influential and important because it's something you and everybody grows up with. It changes the way we look at the world without us knowing, right? So all the big people have created and defined the entire profession -- sometimes without us knowing. For personal influences, I guess I'm thinking about the people who's work I can look at and find that it inspires me to do something differently or to think about something in a different way, compared to the things that I believe are fundamentals of good design. I think people like Don Norman provide the fundamentals that we have to learn, and people like Jef Raskin change how I live.
UBIQUITY: This train of thought leads to the following question: Would you think it might be possible -- and very interesting -- to create a whole Liberal Arts program that has as its core the notion of design?
KOTELLY: Yes, absolutely. I've spoken to many undergraduate students at Tufts who are not sure which major to choose -- often because they have a lot of different interests. When I tell them about Human Factors, they say, "Oh, yes, that's what I want to be doing." Human Factors is great because we, as people, design all the time, right? Everyone designs all the time -- if they're making dinner for someone they think, "What am I going to cook for dinner?" Well, they're designing, even though they have a set of basic core templates built up in their head: Meat, vegetable, starch as a template. They have to fill that template in, right? They find a new way to make a new recipe to improve what they're doing -- a technique. How to speed things up. How to make a better presentation. Why did Martha Stewart get so big? Design. We design all the time, everybody does. I think you could build up a huge curriculum just about design. Making elegant software programming is design work. How you structure it so other people can use it. How you can structure it so that people can understand the structures and change them. So yes, absolutely! To me, a design core would be a wonderful way to structure a full liberal arts program.
UBIQUITY: Ending an interview is like breaking up a meeting. If we were breaking up a larger meeting at some conference, and the instructions were: "All the artists go into the room on the left, and all the scientists go into that room on the right," which room would you go to?
KOTELLY: I'd stay in the same room I was in. I wouldn't budge. I mean, anything done well is both science and art. You have no choice. Things done poorly are one or the other, but things done well are both. Art -- especially if you really use the word "art" -- doesn't need to communicate. It's not a requirement. Design, though, needs to. When you're designing something, you're communicating. The area between art and science is the space of design. So I wouldn't budge. I do art, and I do science, and I call it design.
Ubiquity, Volume 4, Issue 17, June 18 - 24, 2003
Printer Friendly Version
Ubiquity welcomes the submissions of articles from everyone interested in the future of information technology. Everything published in Ubiquity is copyrighted ©2002 by the ACM and the individual authors.