|
|
|
|
|
Insights from Human Factors International
|
 |
|
In This Issue:
|
|
Press 8 for Natural Language: The Future
of IVRs
|
Kath Straub, Ph.D., CUA, Chief Scientist of HFI, discusses the pros and
cons of "natural language" interactive voice response systems.
|
| |
|
|
The Pragmatic Ergonomist
|
Dr. Eric Schaffer, Ph.D., CPE, founder and CEO of HFI offers practical
advice.
|
| |
| |
|
|
|
Since well before 1968 when the Hal 9000 (2001: A Space Odyssey) set
the public's expectations for Interactive Voice Response Systems (IVRs),
researchers built systems that respond to voice commands. In their simplest
forms, these systems are cool. But no matter how cool, technologies only
really take hold when they serve a business imperative. Recent hybrid
systems that synthesize both grammatical and statistical models of speech
recognition now interpret input reliably and accurately. These systems
have gotten good enough that consumers are hearing natural language input
systems more and more frequently. They are replacing both human agents
and the seemingly ubiquitous and annoying "press 8 for..." touch-tone
menu systems.
|
| |
|
|
Bought a ticket from Amtrak, lately?
|
One approach to developing an effective natural language processing systems
is to "naturally" limit the vocabulary of the customer. For
certain environments, such as travel ticketing, the relatively closed
set of words or tokens that the system needs to recognize simplifies the
problem dramatically. For these domains, the computerized system can often
negotiate the entire transaction. However, for more complicated dialogues,
such as customer care and billing, the increased choices and substantially
wider vocabulary make recognition less robust. Is there still a role for
natural language recognition systems in these more complicated interaction
environments?
Suhm and colleagues believe so. They compare the efficacy of voice interaction
versus touch tone input. The comparison focuses on a system that uses
voice recognition and categorization just to route the call to the right
real person (Suhm, Bers, McCarthy, Freeman, Getty, Godfrey and Peterson,
2003). In their experiment, callers who used the baseline touch-tone menu
system indicated their initial choices by selecting their desired routing
from a list of options. They compared that group with a random subset
of callers who were redirected to the speech-enabled IVR. Instead of hearing
a list of options to select from, the speech group were instructed to
"Please tell [the system] briefly, the reason for [their] call."
(This prompt elicits more precise and interpretable responses than the
more common: "May I help you?" according to unpublished research
by Suhm, et. al.) Based on key words in the caller's response, the system
would categorize their need and route them to either a specific agent
or to an automated fulfillment system. Suhm and colleagues collected data
from 95,904 callers who used the touch-tone IVR and 3,759 callers who
experienced the natural language router.
Overall the accuracy rates for the first decision point were similar:
Typically a well-designed touch tone system yields a 70-75% first choice
accuracy rate; the speech-based system correctly categorized the call
topic 78% of the time.
However, other benefits of the natural language IVR emerged immediately:
88.5% of callers invited to describe their reason for calling responded
by doing so. In contrast, only 75.1% of callers to the touch tone system
entered an initial selection. The remaining 24.9% immediately pressed
"0" to escape the touch tone system.
Because it occasionally failed to recognize any key words in the caller's
content, the speech-enabled system re-prompted callers more frequently
than the touch tone system. This rerouting lengthened the call in the
speech-enabled system—a taboo outcome for call center optimization.
However, despite this increase, the overall average routing time for the
natural language system was less than half that of the touch-tone IVR
(16.5 seconds vs. 35.9 seconds, respectively). Further, callers got to
the right place the first try: the natural language system was able to
route callers to a more specific destination with fewer misdirects. This
improvement is significant since every avoided misdirection saves the
approximately 164 seconds that is required for callers to repeat their
reason for calling to each new agent they are directed to.
Overall, Suhm and colleagues concluded that the natural language system
improved the user experience, routing callers more accurately and more
quickly to the right place. Users rated the speech system very positively,
clearly preferring it to the touch-tone system in follow-up surveys.
|
| |
|
|
Talk less... and slower
|
In a similar study, Delude (2002) explored the interaction between aging
and mode of input (touch-tone or voice). In her study, 22 university students
and 22 seniors performed one task on each of 6 IVR systems (5 touch-tone
and one voice activated system). The scenario trials were followed by
a usability questionnaire.
All participants in her study completed at least one of the six IVR tasks.
Interestingly, the distribution of success differed greatly between younger
and older participants. 82% of younger participants completed 5 or 6 of
the 6 tasks. While 32% of older participants completed 5 or 6 of the six
tasks, 50% could complete only one or 2 of the 6 tasks. This suggests
that while many older individuals will clearly successfully navigate IVRs,
individual differences associated with cognitive aging are highlighted
by the requirements of navigating IVR interfaces.
The types of challenges that users faced on the IVRs were similar for
both younger and older participants. They included having difficulties
with:
- Confusing choices or instructions
- Options/Voice being presented too quickly
- Introductory content or menu items too long
- Voice data entry problems (Voice recognition failure or failure to
follow instructions)
- Recovery from errors
- Keystroke data entry problems
- Use of Jargon
Among these, older individuals were most challenged by:
- the speed of presentation,
- failure to follow instructions,
- difficulty understanding jargon,
- difficulty with selection entry,
- and the inability to recovery from error.
Most challenging for older individuals was that these difficulties tended
to compound. Users who could not keep up with the choice alternatives
tended to make errors that they could not recover from. In fact, overall,
younger and older participants behaved similarly except that older individuals
were not typically able to recover from errors.
For this study, the researchers predicted that participants would succeed
more frequently on tasks that required fewer choices. This prediction
held true for touch-tone input systems. However, the success rate for
the voice driven system, which required the second highest number of choices
to complete, produced the highest success of the tasks. According to Delude,
"This exceptional result suggests that voice-activated IVRs do not
follow the same rules as touch-tone IVRs."
|
| |
|
|
Evaluating usability of voice activated IVRs
|
Peissner (2002) suggests that the usability of natural language interaction
systems will be determined by the interplay between:
- the accuracy of the speech recognition, and
- the usability of the decision dialogue.
So how can usability specialists decide the best approach to improving
the user experience? Should they focus on tuning the voice recognition
system, or on re-engineering/enhancing the dialogue.
To answer this question, usability specialists will have to develop methods
for assessing the impact of word recognition accuracy, and dialogue design
effectiveness. This will allow us to allocate our resources in the most
effective way to enhance the overall usability of the system.
|
| |
|
| References |
Delude, L. (2002). Automated telephone answering systems and aging. Behavior
and Information Technology, 21(3), 171-184.
Roush, Wade (2003). Computers that Speak your Language. Technology
Review. June, 23-39.
Peissner, M. (2002) What the relationship between correct recognition
rates and usability measures can tell us about the quality of a speech
application, Paper presented at Work With Display Units (WWDU).
Suhm, B., Bers, J., McCarthy, D., Freeman, B., Getty, D., Godfrey, K.,
and Peterson, P. (2002). A Comparative Study of Speech in the Call Center:
Natural Language Call Routing vs. Touch-tone Menus. Paper presented at
ACM SIGCHI, Minneapolis, Minnesota.
|
| |
|
| |
|
|
|
Natural language interfaces make me nervous. If the interface deals with
tasks that lend themselves to a naturally small (or better, closed) vocabulary,
they may work. In time their use will be common, but this will require
further refinement to the speech recognition algorithm, and possibly learning
on the part of the user.
What I mean by learning is this. Google has an unlabeled field with a
button labeled "Google Search" in the middle of the page. There
are no instructions. But people think up the most unique words they can
and they often use combinations. Most leave out words like "the".
Users have "learned" to work with this type of generalized search
field.
People will adapt to natural language interfaces in the same way. For
example, they may reduce the number of articles they use in their speech.
Like with search engines, they will learn to pick the most common and
discriminating terms for what they want. But this learning has not taken
place yet.
|
| |
|
|
Bernhard Suhm
Call Center Services and Speech Solutions
BBN Technologies, a Verizon Company
|
Congratulations on your excellent write-up on an important issue in the
design of telephone voice user interfaces in your UI Design Update, July
2003!
You discuss an important tradeoff: a more "directed" dialogue,
which steers callers towards saying just a few words, vs. an "open-ended"
dialogue, which (seemingly) opens up the caller to say anything they like.
The truth is that even with such "open-ended" prompts, what
callers really do say is within a quite well bounded subset of general
language, and only that fact makes it possible to develop systems that
accurately interpret responses to open-ended prompts.
So how can usability specialists decide the best approach to improving
the user experience? Should they focus on tuning the voice recognition
system, or on re-engineering/enhancing the dialogue.
I'd like to point out that some of the questions raised at the end of
the essay have already been studied, some in our own research. Our answer
to the questions raised is: both need attention, but key is to obtain
information from end-to-end calls, comprising both of the complete user-IVR
interaction, as well as key pieces of information from any user-agent
dialog that might follow. Refer to [Suhm, Peterson 2002: A Data-Driven
Methodology for Evaluating and Optimizing Call Center IVRs, International
Journal of Speech Technologies, Vol. 5, #1, pg. 23-37].
|
|
|
|
Sallee Garner
|
In the comparisons, was any attempt made to compare the success/ failure
rate of different models of telephones? Part of the appeal of voice recognition
is that you get to keep holding the receiver in a constant position where
you can hear the prompts. The touchtone "press 4 / enter your PIN
/ spell your name" options become more difficult with phones whose
buttons are integrated into the receiver, and I suspect the difficulty
for the elderly (or anyone with impaired hearing) would be even greater.
For example, my office phone system has a number you can call to reach
an automated system where you are invited to spell the name of the person
you are calling, using the phone's keypad. If the last name is not distinctive
enough, you are instructed to continue spelling the first name. As soon
as you have entered enough letters to make a unique pattern (an unpredictable
number of letters), you get another prompt, which is very hard to hear
if you are trying to spell a name on the buttons of your cell phone –
holding it away from your ear – while riding a noisy subway. Failure
to respond correctly may cause you to call the wrong person or to have
to start over. On the other hand, voice recognition when using that same
cell phone could be a problem if reception is poor.
|
|
|
|
Past Issues
|
|