Site MapUser Experience for a Better World ![]() Press 8 for Natural Language: The Future of IVRs
|
||||
|---|---|---|---|---|
|
Since well before 1968 when the Hal 9000 (2001: A Space Odyssey) set the public's expectations for Interactive Voice Response Systems (IVRs), researchers built systems that respond to voice commands. In their simplest forms, these systems are cool. But no matter how cool, technologies only really take hold when they serve a business imperative. Recent hybrid systems that synthesize both grammatical and statistical models of speech recognition now interpret input reliably and accurately. These systems have gotten good enough that consumers are hearing natural language input systems more and more frequently. They are replacing both human agents and the seemingly ubiquitous and annoying "press 8 for..." touch-tone menu systems. |
||||
Bought a ticket from Amtrak, lately? |
One approach to developing an effective natural language processing systems is to "naturally" limit the vocabulary of the customer. For certain environments, such as travel ticketing, the relatively closed set of words or tokens that the system needs to recognize simplifies the problem dramatically. For these domains, the computerized system can often negotiate the entire transaction. However, for more complicated dialogues, such as customer care and billing, the increased choices and substantially wider vocabulary make recognition less robust. Is there still a role for natural language recognition systems in these more complicated interaction environments? Suhm and colleagues believe so. They compare the efficacy of voice interaction versus touch tone input. The comparison focuses on a system that uses voice recognition and categorization just to route the call to the right real person (Suhm, Bers, McCarthy, Freeman, Getty, Godfrey and Peterson, 2003). In their experiment, callers who used the baseline touch-tone menu system indicated their initial choices by selecting their desired routing from a list of options. They compared that group with a random subset of callers who were redirected to the speech-enabled IVR. Instead of hearing a list of options to select from, the speech group were instructed to "Please tell [the system] briefly, the reason for [their] call." (This prompt elicits more precise and interpretable responses than the more common: "May I help you?" according to unpublished research by Suhm, et. al.) Based on key words in the caller's response, the system would categorize their need and route them to either a specific agent or to an automated fulfillment system. Suhm and colleagues collected data from 95,904 callers who used the touch-tone IVR and 3,759 callers who experienced the natural language router. Overall the accuracy rates for the first decision point were similar: Typically a well-designed touch tone system yields a 70-75% first choice accuracy rate; the speech-based system correctly categorized the call topic 78% of the time. However, other benefits of the natural language IVR emerged immediately: 88.5% of callers invited to describe their reason for calling responded by doing so. In contrast, only 75.1% of callers to the touch tone system entered an initial selection. The remaining 24.9% immediately pressed "0" to escape the touch tone system. Because it occasionally failed to recognize any key words in the caller's content, the speech-enabled system re-prompted callers more frequently than the touch tone system. This rerouting lengthened the call in the speech-enabled system—a taboo outcome for call center optimization. However, despite this increase, the overall average routing time for the natural language system was less than half that of the touch-tone IVR (16.5 seconds vs. 35.9 seconds, respectively). Further, callers got to the right place the first try: the natural language system was able to route callers to a more specific destination with fewer misdirects. This improvement is significant since every avoided misdirection saves the approximately 164 seconds that is required for callers to repeat their reason for calling to each new agent they are directed to. Overall, Suhm and colleagues concluded that the natural language system improved the user experience, routing callers more accurately and more quickly to the right place. Users rated the speech system very positively, clearly preferring it to the touch-tone system in follow-up surveys. |
|||
Talk less... and slower |
In a similar study, Delude (2002) explored the interaction between aging and mode of input (touch-tone or voice). In her study, 22 university students and 22 seniors performed one task on each of 6 IVR systems (5 touch-tone and one voice activated system). The scenario trials were followed by a usability questionnaire. All participants in her study completed at least one of the six IVR tasks. Interestingly, the distribution of success differed greatly between younger and older participants. 82% of younger participants completed 5 or 6 of the 6 tasks. While 32% of older participants completed 5 or 6 of the six tasks, 50% could complete only one or 2 of the 6 tasks. This suggests that while many older individuals will clearly successfully navigate IVRs, individual differences associated with cognitive aging are highlighted by the requirements of navigating IVR interfaces. The types of challenges that users faced on the IVRs were similar for both younger and older participants. They included having difficulties with:
Among these, older individuals were most challenged by:
Most challenging for older individuals was that these difficulties tended to compound. Users who could not keep up with the choice alternatives tended to make errors that they could not recover from. In fact, overall, younger and older participants behaved similarly except that older individuals were not typically able to recover from errors. For this study, the researchers predicted that participants would succeed more frequently on tasks that required fewer choices. This prediction held true for touch-tone input systems. However, the success rate for the voice driven system, which required the second highest number of choices to complete, produced the highest success of the tasks. According to Delude, "This exceptional result suggests that voice-activated IVRs do not follow the same rules as touch-tone IVRs." |
|||
Evaluating usability of voice activated IVRs |
Peissner (2002) suggests that the usability of natural language interaction systems will be determined by the interplay between:
So how can usability specialists decide the best approach to improving the user experience? Should they focus on tuning the voice recognition system, or on re-engineering/enhancing the dialogue. To answer this question, usability specialists will have to develop methods for assessing the impact of word recognition accuracy, and dialogue design effectiveness. This will allow us to allocate our resources in the most effective way to enhance the overall usability of the system. |
|||
References |
Delude, L. (2002). Automated telephone answering systems and aging. Behavior and Information Technology, 21(3), 171-184. Roush, Wade (2003). Computers that Speak your Language. Technology Review. June, 23-39. Peissner, M. (2002) What the relationship between correct recognition rates and usability measures can tell us about the quality of a speech application, Paper presented at Work With Display Units (WWDU). Suhm, B., Bers, J., McCarthy, D., Freeman, B., Getty, D., Godfrey, K., and Peterson, P. (2002). A Comparative Study of Speech in the Call Center: Natural Language Call Routing vs. Touch-tone Menus. Paper presented at ACM SIGCHI, Minneapolis, Minnesota. |
|||
Comments(2)
Reader comments on this and other articles. |
||||
![]() The Pragmatic Ergonomist, Dr. Eric Schaffer
|
||||
![]() |
Natural language interfaces make me nervous. If the interface deals with tasks that lend themselves to a naturally small (or better, closed) vocabulary, they may work. In time their use will be common, but this will require further refinement to the speech recognition algorithm, and possibly learning on the part of the user. What I mean by learning is this. Google has an unlabeled field with a button labeled "Google Search" in the middle of the page. There are no instructions. But people think up the most unique words they can and they often use combinations. Most leave out words like "the". Users have "learned" to work with this type of generalized search field. People will adapt to natural language interfaces in the same way. For example, they may reduce the number of articles they use in their speech. Like with search engines, they will learn to pick the most common and discriminating terms for what they want. But this learning has not taken place yet. |
|||