About HFI   Certification   Tools   Services   Training   Free Resources   Media Room  
               
 Site MapUser Experience for a Better World   
Human Factors International Home
Free Resources
Bookmark and Share

Interactive Voice Response (IVR) Case Study

Testing Your Telephone-Based e-Commerce Support (continued)

published in The Journal of Electronic Commerce, Volume 12, Number 2

 

Below, we display a chart of the performance results for easy comprehension. The chart requires tabulation of results (see analysis, below).

IVR Chart 1

Figure 1

Below, we display subject comments for the two tasks failed by all participants but one. Typically, subjects will not be clear on the source of their confusion. The designer or usability specialist must evaluate each comment for its design significance. Occasionally, a subject offered no comment. We only present two examples here.

13 Failed to "Change PIN" (Task 1 – Each bullet is from a different fail subject.)

  • I want to update my card, not a service.
  • Lots of options and you would need it in writing. I'm not sure if the menus are organized in related groups.
  • Use fewer words in sentences.
  • Never found words I could recognize as a similar term.
  • Nothing was clear. Too many options per number on the menu.
  • PIN option wasn't there. I was reaching my limits of patience.
  • Was a little confusing. Voice is overly sweet.
  • He says too much for each option.
  • They assume you know what the words talk about.
  • There was nothing specific for calling cards. I got lost.
  • All the talking seems to slur together.
  • I don't know where to change PIN.

13 Failed to "Review Fees" (Task 2 – Each bullet is from a different fail subject.)

  • Was easy to do.
  • Lots of choices that had nothing to do with what I wanted to do.
  • Would be nice to have it (CSR option) on the main menu, but I can see why they don't. Not so satisfied, a little frustrating.
  • Got a little lost in messages
  • This was easy task because I am becoming familiar
  • Felt satisfied with speed of getting an operator
  • Going through all options is a waste if I get a cs rep anyway.
  • Opening summary paragraph is annoying. It isn't helpful.
  • I feel a ... frustration which is not the kind a business wants.
  • What's the point of the first menu ... you can't make any selection I wanted a CS rep option on the first menu...
  • From the Phones-R-Us side, it's OK that it took so long, but I get frustrated.

Recall that we also collected statements regarding subjects "overall" impression. Note in the following, that 6 subjects gave favorable statements. They represent 44% of the subjects. Clearly, subjective impressions can be misleading even in the face of severe usability problems.

6 Positive Post-test "Overall Impressions" (Each bullet is from a different positive subject.)

  • The system is quite well designed
  • It's good
  • Relatively usable menu that doesn't waste too much time
  • Needs work. Not bad though
  • OK, but needs more improvement
  • Professional corporate attempt to address a very complicated customer service need – showed much work and effort, yet still room for improvement

8 Negative Post-test "Overall Impressions" (Each bullet is from a different negative subject.)

  • A little confusion – a lot offered, but not easily accessible
  • Irrelevant, I always ended up going to Customer Service, yet at the beginning there was no option for Customer Service. The whole system couldn't help on half the problems. I could just ask customer service at the beginning and save time.
  • A bit too much information on the first menu. Could there be a simpler way to get started?
  • Slow. They're more interested in mileage than phone service. Structure not completely clear to find what is needed.
  • Complicated because main menu difficult to understand
  • Took too long to get anywhere
  • Frustration and disease. Feeling of confusion and too much info at each stage.
  • Very messy menus. Unclear and confusing
Step 4. Collect Satisfaction Data

After the test protocol, each subject filled out a satisfaction questionnaire. (See next page.) Because the subject had just attempted 10 tasks, they could easily reflect on their subjective reactions. The questionnaire represents five categories of satisfaction (discussed below). We altered the original questionnaire (from other sources) to accommodate IVR technology.

Step 5. Analyze Performance Data

In this case, our goal is to show data indicating the extent of the IVR problems. The results guide whether to make design changes or not. Changes themselves presume expert knowledge of IVR design. We represent this phase of the test with the following summary.

Tasks

1. Change PIN
2. Review fees per FTA
3. Learn MCI & Airmiles
4. Get code for Austria
5. Check overbilling
6. Check home rates
7. Add personal 800#
8. Learn France to Japan codes
9. Get credit for operator error
10. Get CSR

Overall Results

  • 10 tasks X's 14 subjects = 140 test events
  • 10 items together averaged 27.1% passing rate
  • 8 worst items averaged 11.6% passing rate
  • 40% of all failures occur on the first menu item

We scored any task as "fail" if the subject used a different IVR option than what the designer intended. Often, a subject would get a CSR, thinking it was a planned event. While the subject felt a positive outcome, the IVR had failed. We also logged the level of menu at which the subject failed to press the correct phone button. Notably, 40% of all the failures occurred on the first menu item – a prime target for improvement.

We indicate that 140 "test events" constitute the body of evidence. In our data analysis we learned our subjects only passed 27.1% – 38 of the 140 events. Since 22 of those 38 passes occurred in only 2 of the test items, we restated the results. We point out that the 8 worst items merited only 11.6% passing rate – only 16 out of 112 test events. Pretty expensive. The CSR staff has to work hard to keep up with the callers asking for human help.

Step 6. Analyze Satisfaction Data

We grouped and averaged data from the satisfaction questionnaire as follows. Where a question implied a negative response, the answer was mathematically converted to match the meaning of the group description. If you use the questionnaire, calculate the mean of the following questions for each category.

a. Learnability: 1, 6, 12, 8, 20 (high priority)
b. Efficiency: 2, 4, 9, 11, 17, 21 (high priority)
c. Control: 3, 7, 8 (low priority)
d. Helpfulness: 10, 13, 15, 16 (low priority)
e. Likability: 5, 14, 19 (medium priority)

Global Metric: .3a + .3b + .1c + .1d + .2e

We established a yardstick of positive merit based on the nature of the scale. Recall that the subjects selected a number from 1 to 7, with 4 representing a neutral point. The next interval above the neutral 4 is 5. Therefore, we set a score of 5 or above as indicating "positive" rating. (See Figure 2)

IVR Chart 2

Figure 2

Top

© 1996-2012 Human Factors International, Inc. All rights reserved  |  Privacy Policy  |   rss feed biber hapıbiber hapı