|
|
|
|
|
Insights from Human Factors International |
 |
|
In This Issue: |
|
Why "how many users" is just the wrong question – Rethinking the requirements for valid usability tests
|
HFI Chief Scientist, Kath Straub, PhD, CUA, revisits the question about the number of users required for an effective usability test. |
|
The Pragmatic Ergonomist |
Dr. Eric Schaffer, Ph.D., CPE, founder and CEO of HFI offers practical
advice. |
| |
| |
|
Death. Taxes.
How-many-users. |
Every day in offices around the world usability professionals ask and are asked this question: How many users do we need for our usability test? Its an important question. We want to find most of and the most severe problems. So, we need to test enough people. But usability testing is so expensive, and the cost of testing increases with each participant. So, we don't want to test too many, either.
On the one hand, synthesizing the received theoretical wisdom suggests that there is an answer to this question. And answer is "5." (Virzi 1992; Nielsen and Landauer, 1992) That is, based on a probabilistic formula, you will need to test 5 users to find about 85% of the problems that will trip up 1/3 or more of your users. The number 5 is very concrete. Practitioners like it. 5 is easy to remember.
On the other hand, this question gets debated every year at the CHI conference. You can count on it.. Like death and taxes. The same debate. Given that the UX community (re-)debates this every year, it seems that the wisdom has not been so well received. |
 |
Blue! No, Green!...
No, 5! |
That the number 5 has such staying power says something interesting about human memory and the way people reason. The 5-formula can work. But, like tossing a coin, it's probabilistic. If you keep flipping a coin over and over, it will come up heads half the time. But it can also come up tails nine times in a row.
Similarly, if you run enough usability tests with 5 users, on average you will find most of the errors about most of the time. But if you run only one test (or just a few) with 5 users, it's possible that you will uncover fewer errors than the formula projects. (Spool and Schroeder, 2001; Faulkner, 2003, or you are less ambitious, there is the May, 2004 newsletter.)
There are other challenges with the 5-formula. For instance, to calculate the number of testing participants you need, a priori you need to know how many problems there are to find. If you knew that, likely you wouldn't need to test to find them, eh? |
 |
Reach beyond...
# of users |
Not surprisingly, the debate churned on in San Jose (CHI 2007). But this year, Lindgaard and Chattratichart (2007) threw down a different gauntlet. The obstacle to solving the problem, they said, is the question. "How many users" is the wrong way to think about it.
In usability testing, we are looking for mismatches between the site/app model and the user's mental model on the key and critical tasks. Framed this way, the criterion that determines how many problems get uncovered is how many tasks participants try, not how many participants there are.
To test their claim, Lindgaard and Chattratichart reanalyzed the usability testing data from CUE-4* (Molich, 2003 – Workshop Reference). Within that project, 9 highly experienced teams used think-aloud techniques to independently test the same site. The teams received identical input from the coordinators (site objectives, problem criteria, testing focus). Each team shaped their own testing plan and protocol, conducted the testing, and aggregated the findings into a pre-determined feedback format.
Lindgaard and Chattratichart looked for similarities and differences across the methods and findings reported by each team. Specifically, they were seeking relationships between test design (e.g., # users, # tasks) and number of problems identified.
Their study reports that there was no reliable correlation between the number of users tested and the number of usability problems uncovered. Testing more users did not ensure that that more problems would be discovered. Further, although each of the 9 teams tested 5 users or more, they reported only 7-43% of the known problems, not the 85% predicted by the 5-formula.
In contrast, their analysis showed a significant positive correlation between the number of tasks evaluated and the number of problems uncovered. That is, the more tasks a team included in their testing protocol, the more problems they uncovered.
They conclude that other things being equal (e.g., quality of recruiting), the better predictor of the productivity of usability testing is the number of tasks participants (try to) complete, not the number of participants who try to complete them.
______________
* The CUE Studies, Molich and Dumas, in press; Molich, Kaasgaard and Karyukin, 2004, among others, compare methods and findings of different teams conducting the same usability test. CUE findings show that different usability testing teams evaluating the same interface report different numbers usability problems, often with very little overlap in the identified. There's clearly more to it than number of users. |
| |
|
| |
|
|
|
This result is fantastic! It's like trying to find potholes in a city. Not every car hits every pothole in the road. So you need to send a number of cars down each road. But it is even more important to send cars down a larger NUMBER of roads. The key seems to be in more tasks, not just more users. The problem is that you can only run a given number of tasks with a single test participant. More than 60 or perhaps 90 minutes of testing won't work well.
I propose a "Lingaard-Chattratichart Testing Strategy." Test 3 different groups of participants. Put maybe 6 to 12 people in each group. Then have each group do a different basket of tasks. This will allow us to test a LOT of different tasks and should get a far better level of reliability. |
 |
|
References |
Faulkner, L. Beyond the five-user assumption: Benefits of increased sample sizes in usability testing. Behavior Research Methods, Instruments & Computers, 35, 3, Psychonomic Society (2003), 379- 383.
Lindgaard, G. and Chattratichart, J. Usability Testing: What Have We Overlooked? CHI 2007 Proceedings, ACM Press (2007).
Molich, R. & Dumas, J. S. Comparative Usability Evaluation (CUE-4). Behaviour & Information Technology, Taylor & Francis (in press).
Molich, R. & Jeffries, R. Comparative expert review. In Proceedings CHI 2003, Extended Abstracts, ACM Press (2003), 1060-1061.
Molich, R., Ede, M. R., Kaasgaard. K., & Karyukin, B. Comparative usability evaluation. Behaviour & Information Technology, 23, 1, Taylor & Francis (2004), 65-74.
Nielsen, J., & Landauer, T. K. A mathematical model of the finding of usability problems. In Proceedings of INTERCHI 1993, ACM Press (1993), 206-213.
Spool, J. & Schroeder, W. Testing Websites: Five users is nowhere near enough. In Proceedings CHI 2001, Extended Abstracts, ACM Press (2001), 285-286.
Virzi, R.A. Refining the test phase of usability evaluation: How many subjects is enough? Human Factors, 34, HFES (1992), 457-468. |
|
|
|