HFI Usability Home

Usable. Experience. Design.

HFI Usability Home About HFI - Usability Experts Usability Consulting Usability Training & Certification Usability Tools & Standards Usability Newsletter Executives Only  

Contact Us | 1-800-242-4480

 
UI Design Newsletter
Current Issue
Past Issues
Reader Comments
Subscribe
Change Address
divider
HFI Webcasts
May 2008 Webcast
Upcoming Webcasts
Past Webcasts / Podcasts
divider
Ask Eric
Questions & Answers
Ask your question
divider
Readings
Published HFI Articles
White Papers
Intranet Standards
GUI Standards
Quantitative Usability
e-Commerce Usability
GUI Design
IVR
divider
Just Fun
Cartoons
Mouse Maze
10 Web Usability Tips
Usability Quiz
Web Usability Quiz
Contextual Innovation Quiz
History of HFI Buttons
divider
Resources
Accessibility
Bibliography
Usability Links
HCI Degree Programs

Interactive Voice Response (IVR) Case Study

Print this page | Email this page

Testing Your Telephone-Based e-Commerce Support (continued)

published in The Journal of Electronic Commerce, Volume 12, Number 2

 

<<Previous | 1 | 2 | 3 | 4 | 5 | Next>>

 

We've already given you the punch line: the low-fidelity test performed as well as the high fidelity test. In fact, the researchers recommended "we would not have spent the time and effort to build a high-fidelity prototype" if their only goal was usability testing. (The project did have other goals.) "In fact this is how we currently design IVR systems in practice." To encourage you in future analyses, here are some of the measures that showed group equivalency. First, the experimenters identified 21 problems with the IVR interface. Comparing the two groups, they found no differences in...


  • types of problems the subjects uncovered – the hi-fi group found 19 problems, the lo-fi found 20.
  • sensitivity of the tests – the number of subjects locating each problem was about the same for each group
  • severity of the problems – "eye ball" examination revealed no striking differences in the ability of the lo-fi group to uncover severe problems (both groups uncovered nearly all the problems)

The authors list issues for which a high fidelity prototype can be useful. However, mockups limited to specific questions could serve, as well. A prototype or mockup can test...

  • intelligibility of the selected voice or speech synthesis
  • concatenation of prompts by the caller
  • time to complete menu selections and other performance measures
  • display characteristics (e.g., font, images, colors)
  • marketing personnel reactions (they like to see verisimilitude – the real thing!)

Furthermore, investment in a prototype can enhance...

  • demonstrations for marketing purposes
  • uncovering specifications that may not otherwise be obvious
  • review of features and functions for documentation and training design purposes
How to Conduct an IVR Usability Test
 
Introduction

The following steps and data represent a demonstration project that Human Factors International, Inc. accomplished on an IVR that served a telecommunications firm that we will call Phones-R-Us. Expert review indicated significant potential for user confusion and consequent overload of the CSR staff. Subjects came from a university population – students and staff.
Use the following steps for your tests. Remember the "Wizard of Oz" technique given above – you don't need an operational IVR, although in this test we used one. Here's an overview of how you could present your findings.

IVR Usability Test

  • Subjects
  • Tasks
  • Performance Results
  • Satisfaction Results
  • Next Steps
Step 1. Get Subjects

Choose the number of subjects to match the expected probability of finding a given problem. Big problems need fewer subjects. Subtle problems need more subjects. Experience indicates 10-20 subjects would provide insight into the problems that we anticipated. Our intern tested 16 subjects with telephone experience and varied educational background and gender. He used 2 of the sessions to learn to write the subject's comments rapidly and concisely. We used data from the following 14 subjects. Our intern videotaped five of the interviews in case we wanted to demo the process.

If needed, provide training to give your subjects the same expertise your actual users have. (If you expect a specific background, then recruit – and pay – subjects from your user population.) In our case, we only needed experience using a telephone and age enough to qualify for a telephone card. Here's a subject selection summary:

Subjects

  • 14 subjects from a university setting.
  • Represents cross-section of US population
  • 4 Female (29%)
  • 4 English as a second language (ESL)
  • Ages 41-50:4; 31-40:1; 21-31:7; 17-20:2
  • PhD:1; MA:2; BA:4; HS:7
  • Homogeneous: ESL subjects had similar satisfaction ratings as English (5 NSDs)

Comment During data analysis (see below) we wanted to see if ESL made a difference in how subjects felt about the IVR menu. Therefore, we used a statistical test to check for differences between the average scores on each of the 5 satisfaction ratings (given below). "NSD" means No Significant Difference would be found 19 times out of 20 similar tests (the so-called "95% confidence" rating). We used the t-test for unequal variances found in Microsoft Excel. You don't need such confirmation if your own group of test subjects has no particular differentiating characteristic.

Step 2. Determine the Tasks and Test Script

Based on preliminary expert review, we had specific issues we wanted to test. Were our suspicions correct? What percentage of average users would have difficulty? One of us devised 10 test scenarios to meet these needs. Concrete language and specific instances make the test more valid. If necessary, provide any paper documents that would normally be used, such as a credit card statement

On the next page is our test script, with the task scenarios. The data supervisor read the script to maintain consistency of expectation and motivation among subjects.

Step 3. Collect Performance Data Our intern spent about an hour with each subject. He recorded the demographic data (indicated above), then administered the test script. He used a speakerphone so that he could hear the IVR prompts. (Remember, if this were a "low-fidelity" test he would have read out the prompts himself.) He asked the subjects to tell him which button they pressed. He made a point to record the button presses in sequence for each test question. He also recorded his observations and useful subject comments for each button press. At the end of each task, he asked them to describe their experience and degree of difficulty.

Note that a subject may have felt they completed the task correctly because they got a CSR – whether by accident or on purpose. In reality, they were scored as fail because they didn't follow the intent of the design. Subjects had no difficulty with the presence of videotape equipment and its operation.

 

<<Previous | 1 | 2 | 3 | 4 | 5 | Next>>