HFI Usability Home

Usable. Experience. Design.

HFI Usability Home About HFI - Usability Experts Usability Consulting Usability Training & Certification Usability Tools & Standards Usability Newsletter Executives Only  

Contact Us | 1-800-242-4480

 
UI Design Newsletter
Current Issue
Past Issues
Reader Comments
Subscribe
Change Address
divider
HFI Webcasts
October 2008 Webcast
Upcoming Webcasts
Past Webcasts / Podcasts
divider
Ask Eric
Questions & Answers
Ask your question
divider
Readings
Published HFI Articles
White Papers
Intranet Standards
GUI Standards
Quantitative Usability
e-Commerce Usability
GUI Design
IVR
divider
Just Fun
Cartoons
Mouse Maze
10 Web Usability Tips
Usability Quiz
Web Usability Quiz
Contextual Innovation Quiz
Persuasive Design Quiz
Persuasion Flow Symbols
History of HFI Buttons
divider
Resources
Persuasion Flow Symbols
Accessibility
Bibliography
Usability Links
HCI Degree Programs

UI Design Newsletter – September, 2002

Past Issues | Print this page | Email this page

Insights from Human Factors International

divider line

In This Issue:

How reliable is usability performance testing?

Bob Bailey, Ph.D., Chief Scientist for HFI, discusses the limitations of usability performance testing.

The Pragmatic Ergonomist

Dr. Eric Schaffer, Ph.D., CPE, founder and CEO of HFI offers practical advice.

   
How reliable is usability performance testing?
   

Kessner, M. (2000), On the reliability of usability testing, Carleton University Masters Thesis, Ottawa, Ontario, December.

Kessner, M., Wood, J. Dillion, R.F. and West, R.L. (2001), On the reliability of usability testing, CHI 2001 Poster.

Molich, R., Thomsen, A.D., Karyukina, B., Schmidt, L., Ede, M., Oel, W.V. and Arcuri, M. (1999), Comparative evaluation of usability tests, CHI'99 Extended Abstract, 83-84 (Summary available.)

Molich, R., Bevan, N., Curson, I., Butler, S., Kindlund, E., Miller, D., Kirakowski, J. (1998), Comparative evaluation of usability tests, Proceedings of the Usability Professionals Association. (Summary available.)

Rolf Molich of DialogDesign in Denmark published two articles (Molich, et.al., 1998; Molich, et.al., 1999) over the past three years that helped us to understand better the limitations of even our best usability testing method – performance testing.

He and his colleagues did a comparative evaluation of usability tests by having four commercial usability labs carry out tests on the same commercially available calendar program. The purpose of the comparative evaluation was to observe the different ways in which independent laboratories conducted usability tests. The testers independently performed usability tests that each involved about five typical users, and then prepared a test report. Their results showed that some labs found few usability problems (4), while others found many (98).

Usability Laboratories A B C D
Usability Specialists 2 2 1 3
Number of Tests 18 5 4 5
Problems Found 4 98 25 35

Only one problem was found by all four teams, and over 90% of the problems found by each team was found only by that team.

Molich and his colleagues conducted a follow-up to the first test to determine if the results were unique or could be replicated. In the second study, seven different professional usability labs and two university student teams independently carried out usability tests of a well-known Web site – hotmail.com. They each prepared and submitted their standard test report. Again, their results showed that some labs found few problems (10), while others found many (150).

Usability Laboratories A B C D E F G H I
Usability Specialists 2 7 1 1 3 1 1 3 7
Number of Tests 7 6 6 50 9 5 11 4 6
Problems Found 26 150 17 10 68 75 30 18 20

The results from the first study were, indeed, replicated. Again, there seemed to be little consistency across testing organizations. Over half (55%) of the problems found by each team were found only by that team.

More recently, Martin Kessner (Kessner, 2000; Kessner , et.al., 2001) from Carleton University in Ottawa had six usability testing teams conduct usability tests on a prototype of a system.

He attempted to improve the agreement of the testing teams by

  • testing a prototype that had not yet been used by actual users,
  • limiting the issues to be evaluated to five questions specified by designers,
  • focusing exclusively on usability issues (excluding all marketing and other issues),
  • having two evaluators group similar observations into categories of problems that were essentially the same, and
  • using only professional usability teams (no student teams).

From the original total of 117 potential "usability problems" reported by all the testing teams, the evaluators excluded 31 as non-usability problems. They then combined similar problems and ended up with a final number of 36 unique usability problems. Consistent with the first two studies, none of the problems was found by every team, and a large proportion of the problems (44%) were found by one team only.

When considering the five specific questions that designers wanted answered, there was moderate agreement among the teams on two questions, and low agreement on the other three.

Taken together, the findings of these three studies show that there is considerable need for improvement in the usability testing process. Contrary to what some would like us to believe, effective usability testing is extremely difficult to do well. As a discipline, we need fewer "discount" methods, and more research-based, truly valid methods for finding usability true problems.

These findings show that even experienced usability professionals have difficulty in identifying usability problems. Should designers trust all observations made by usability professionals? With this much variability in performance testing results, should Web site designers trust any observations made by usability professionals?

Usability professionals do not let clients drop off a prototype Web site with the request to find as many problems as possible; and professional designers do not take seriously the never-ending list of "problems" identified by someone who has a usability lab with fancy video equipment. Any amateur with a conference room and a couple of subjects can use a performance test to find all kinds of so-called "usability problems." Some do not even need the test subjects – they can find a multitude of "problems" just by staring at a website and fiddling with the links.

I agree with Kessner, et.al. (2001), the one thing that will most likely reduce the large-scale disagreements among usability testers is to have designers specify precisely the usability questions they have.
Ideally, these questions will include the maximum allowable time for task completion, and a clear definition of success for each task. The true usability professional can then effectively use a performance test to identify those usability problems that most need finding and fixing.

The Pragmatic Ergonomist
   
 

Well, this compilation should give every developer or human factors professional a jolt, I am not surprised by these findings. For 25 years I have routinely interviewed people who said they were human factors specialists because "they were humans." Conversely I have also interviewed hundreds of heavily credentialed human factors professionals, who understood the usability technology, but had no practical sense for what would be important. When testing it is essential to test a set of scenarios that fit closely with the business imperative of the site. It is also necessary to test with a reasonable number of subjects (maybe 20 – not just 5). It is also essential to select your usability staff or consultants with care.

Usability testing is an essential part of our craft. We must gather data well and use that to establish a good initial design. Then, just as a potter shapes a vessel, we progressively use the results to adjust our design. Sometimes we must rethink the whole structure. Sometimes the adjustment of a single word effects the success rate enormously. Even with the most highly trained, intuitive, and savvy professional, interface design is by nature an iterative process.

Two weeks ago I ran a usability test on a financial planning package. Over 80% of the users stopped at a question about their "income range." They did not understand why they would use it; even though there was an explanation just above that said it was to allow calculation of tax rate. Most said they would not proceed. We switched the question from "Income Range" to "Tax Bracket" (showing the related income ranges). Problem gone. How could we NOT take advantage of usability testing results like that?

Comment on this article
 
Name: *
Company:  
Email: *
Comment:  

Reader comments on this and other articles.

The HFI User Interface Design Update Newsletter discusses the latest research in the field of usability. To learn more about the practical application of recent usability research and how it impacts user-centered design, we invite you to attend our Putting Research into Practice course.

Past Issues