|
|
|
|
|
Insights from Human Factors International
|
 |
|
In This Issue:
|
|
Pitting usability testing against heuristic review
|
|
Kath Straub, Ph.D., CUA, Chief Scientist of HFI, looks at when
to use usability testing and when to use heuristic reviews.
|

|
|
|
The Pragmatic Ergonomist
|
Dr. Eric Schaffer, Ph.D., CPE, founder and CEO of HFI offers practical
advice.
|
| |
|
| |
|
| |
Consider this scenario: You are managing the Intranet applications for
a large company. You've spent the last year championing data-driven (re-)design
approaches with some success. Now there is an opportunity to revamp a
widely used application with significant room for improvement. You need
to do the whole project on a limited dollar and time budget. It's critical
that the method you choose models a user-centered approach that prioritizes
the fixes in a systematic and repeatable way. It is also critical that
the approach you choose be cost-effective and convincing. What do you
do?
Independent of the method you pick, your tasks are essentially to:
- Identify the problems
- Prioritize them based on impact to use
- Prioritize them based on time/cost benefits of fixing the problems
- Design and implement the fixes
In this situation, most people think of usability testing and heuristic
(or expert) review.
In usability testing, a set of representative users are asked (individually)
to complete a series of critical and/or typical tasks using the interface.
During the testing, usability specialists observe the self-evidency of
the task process. That is, they watch as participants work through preselected
scenarios and note when participants stumble, make mistakes, and give
up. The usability impact is prioritized objectively according to how frequently
a given difficulty is observed across the participants. Prioritizing cost-benefits
of fixes and suggesting specific design improvements for that specific
interface are typically based on the practitioner's broader experience
with usability inspection/testing experience.
As the name implies, heuristic review offers a shortcut method for usability
evaluations. Here, one or more specialists examine the application to
identify potential usability problems. Problems are noted when the system
characteristics violate constraints known to influence usability. The
constraints, which vary from practitioner to practitioner, are typically
based on principles reviewed in usability tomes (e.g., Moglich and Nielsen,
1990) and/or criteria derived from and validated by human factors research
(e.g., Gerhard-Powals, 1996). The critical steps of prioritizing the severity
of the problems identified and suggesting specific remediation approaches
for that interface are based on the practitioner's broader experience
with usability inspection/testing experience.
Empirical evaluations of the relative merit of these approaches outline
both strengths and drawbacks for each. Usability testing is touted as
optimal methodology because the results are derived directly from the
experiences of representative users. For example, Nielsen and Phillips
(1993) report that despite its greater absolute cost, user testing 'provided
better performance estimates' of interface effectiveness. The tradeoff
is that coordination, testing, and data reduction adds time to the process
and increases the overall man- and time-cost of usability testing. As
such, proponents of heuristic review plug its speed of turnaround and
cost-effectiveness. For example, Jeffries, Miller, Wharton, and Uyeda,
(1991) report a 12:1 superiority for an expert inspection method over
usability testing based on a strict cost-benefit analysis. On the downside,
there is broad concern that the heuristic criteria do not focus the evaluators
on the right problems (Bailey, Allan and Raiello, 1992). That is, simply
evaluating an interface against a set of heuristics generates a long list
of false alarm problems. But it doesn't effectively highlight the real
problems that undermine the user experience.
There are many, many more studies that have explored this question. Overall,
the findings of studies pitting usability testing against expert review,
lead to the same ambivalent (lack of) conclusions.
|
|
Hear what I say, or watch what I do?
|
In an attempt to find alternative approaches to compare usability testing
and expert review, Muller, Dayton and Root (1993) reanalyzed the findings
from four studies (Desurvire, Condziela, and Atwood, 1992; Jeffries and
Desurvire, 1992; Jeffries, Miller Wharton and Uyeda, 1991; and Karat,
Campbell and Fiegel, 1992). Rather than looking at the raw number of problems
identified by each technique, their re-analysis categorized the findings
of the previous studies on parameters such as:
- # problem classes per hour invested,
- # of classes of usability problems identified,
- likelihood of identifying severe problems,
- uniqueness of results, and
- average cost/problem identified for each technique.
Again, their re-analysis demonstrated no stable difference indicating
that either usability testing or heuristic review (conducted by human
factors professionals) is a superior technique.
An array of methodological inconsistencies makes interpreting the findings
in toto even more challenging. The specific types of interfaces or tasks
used in comparison vary widely from study to study. Many studies do not
clearly articulate what the "experts" are expected to do to
come up with their findings (much less, what they really DO). The specific
heuristics applied are rarely specified clearly. The level of expertise
of the evaluators is rarely described or clearly equated, although it
is often offered informally as a factor in the diversity of outcomes.
As such, it is possible that, among other things, the conclusions of specific
studies falsely favor a method, when relative benefits really result from
the broader and deeper experience of the individual implementing the method
(Johns, 1994).
|
|
So, what's an Intranet manager to do?
|
Equivocal research findings aren't really helpful when your task is to
select the most cost-effective means of identifying, prioritizing and
fixing problems in your interface. To make it worse, it appears that the
problems identified by the two approaches are largely non-overlapping.
According to Law and Hvannberg (2002), the problems identified by usability
testing tend to reflect flaws in the design of the task scenarios, such
as task flows that do not reflect the steps/order that users expect. In
contrast, expert reviews highlight problems intrinsic to the features
of the system itself, such as design consistency. Actually that's the
good news.
Findings from recent studies extend those findings and may potentially
help identify the parameters for identifying the right solution. Instead
of pitting one strategy against the other, this study focuses on identifying
the qualitative differences between the findings of usability testing
and expert review.
|
|
Levels of Understanding
|
Rasmussen (1986) identified three levels of behavior that could lead
to interface challenges: skill-based, rule-based, and knowledge-based
behavior.
Success at the skill-based level depends on the users'
ability to recognize and pay attention to the right signals presented
by the interface. Skill-based accuracy can be undermined, for example,
when non-critical elements of an interface flash or move. These attention
grabbing elements may pull a user's focus from the task at hand, causing
errors.
Success at the rule-based level depends on the users'
ability to perceive and respond to the signs that are associated with
the ongoing procedure. Users stumble when they fail to complete a step
or steps because the system (waiting) state or next-step information was
not noticeable or clearly presented.
Success at the knowledge-based level depends on the
users' ability to develop the right mental model for the system. Users
acting based on an incorrect or incomplete mental model may initiate the
wrong action resulting in an error.
Fu, Salvendy and Turley (2002) contrasted usability testing and heuristic
review by evaluating their effectiveness at identifying interface problems
at Rasmussen's three levels of processing. In their study, evaluators
were assigned to evaluate an interface via either observing usability
testing or heuristic review starting with an identical set of scenario-based
tasks. Across the study, 39 distinct usability problems were identified.
Consistent with previous research, heuristic evaluation techniques identified
slightly more of the problems (87%) and usability testing slightly fewer
(54%). There was a 41% overlap in the problems identified.
More interestingly, when the problems/errors were categorized on Rasmussen's
behavior-levels, heuristic review and usability testing identified complimentary
sets of problems: heuristic review identified significantly more skill-based
and rule-based problems, whereas usability testing identified challenges
that occur at the knowledge-based level of behavior.
Upon consideration, this distribution is not terribly surprising. Usability
Testing identifies significantly more problems at the knowledge-level.
Knowledge-based challenges arise when users are learning – creating
or modifying their mental models – during the course of the task
itself. Often the problems that surface here are the result of a mismatch
between the expected and actual task flow. Since the mental interaction
models for experts are usually fairly articulated, experts are not good
at conjuring the experiences or speculating about expectations of novice
users.
In contrast, skill- and rule-based levels of behavior are well studied
and documented in attention and perception literature (e.g., Proctor and
Van Zandt, 1994). Human Factors courses often focus on these theories.
The criteria for heuristic review are essentially derived from them. It
is not surprising that usability specialists focusing on heuristic criterion
for evaluation would identify relatively more problems at this level.
Not incidentally, parameters of interface design affecting skill- and
rule-based levels of behavior reflect characteristics that are intrinsic
to the interface itself. Intrinsic problems are more likely challenge
advanced users because they present a fundamental challenge to their experience-based
"standard model" of interface interactions.
Fu, Salvendy and Turley (2002) conclude that the most effective approach
is to integrate both techniques at different times in the design process.
Based on their analysis, heuristic review should be applied first or early
in the (re-)design process to identify problems associated with the lower
levels of perception and performance. Once those problems are resolved,
usability testing can be used to effectively focus on higher level interaction
issues without the distraction of problems associated with skill- and
rule-based levels of performance.
Mindful of budget and time constraints, Fu and colleagues also note that
if redesign is concerned only with icon design or layout, heuristic review
may be sufficient to the task. However, if modification affects software
structure, interaction mapping or complex task flows, usability testing
is the better choice. To that end, the more complex the to-be-performed
tasks are, the more critical representative usability testing becomes.
The more complex the proposed redesign, the more critical that both methods
be employed.
|
| It depends... |
So what should you do with your evaluation project? Like most other projects,
it depends on the specific case. Despite the chaotic nature of the field,
it is still possible to draw a few conclusions from these studies:
- heuristic (expert) reviews and usability testing identify different
types of usability problems
- expert reviews really only work well when experts do them
- combining techniques optimizes the return
Taken together these suggest that to select the most appropriate method
you will need to consider more than your budget and the possible methodologies.
To make the best decision, you will also need to weigh the expertise of
your evaluators, the maturity of your application, the complexity of the
tasks, and possibly even the current status of your usability program.
|
| |
|
| |
When people ask me to run a usability test I usually recommend an expert
review first.
In formative testing (intended to mold the design) expert reviews are
just too good a deal to pass up. They have to be done by experts and the
experts must focus on systematic analysis and comparison with many hundreds
of research-based principles. In fact, I think a proper expert review
can get at issues of mental model and navigational structure as well as
usability testing.
The expert review can find opportunities that could never be discovered
in a test. We might see that users take 45 seconds to request a wakeup
call using an automated hotel wakeup system. But only an expert would
realize that defaults could be used to cut this time significantly.
Usability testing is more expensive. It may be needed to graphically
convince stakeholders that there are usability issues (it is hard to argue
with video of users in tears). Testing will identify things that the experts
miss and as such is a good follow on to expert reviews. Usability testing
is required for summative analysis (measuring if usability objectives
have been met).
|
 |
|
References
|
Bailey, R. W., Allan, R. W., and Raiello, P. (1992). Usability testing
vs. heuristic evaluation: A head-to-head comparison. Proceedings
of the 36th Annual Human Factors Society Meeting. pp.409-413.
Desurvire, H. Kondziela, J. and Atwood, M. (1992). What is gained and
what is lost when using evaluation methods other than usability testing.
Paper Presented at Human Computer Interaction Consortium.
Gerhardt-Powals, J. (1996). Cognitive engineering principles for enhancing
human-computer performance. International Journal of Human-Computer
Interaction, 8(2), 189-211.
Moglich, R. and Nielsen, J. (1990), Improving a human-computer dialogue.
CACM, 33(3), 338-348.
Nielsen, J. (1994). Enhancing the explanatory power of usability heuristics.
Proceedings of CHI'94. ACM, New York. pp
153-158.
Jeffries, R. and Desurvire, H. (1992). Usability Testing vs. Heuristic
Evaluation: Was there a Contest? SIGCHI Bulletin,
24(4) 39-41.
Jeffries, R. Miller, J. Wharton, C. and Uyeda, K. (1991). User interface
evaluation in the real world: A comparison of four techniques. In Proceedings
of CHI'91. (New Orleans, LA) ACM, New York. Pp. 119-124.
Karat, C. M., Campbell, R. and Fiegel, T. (1992). Comparison of empirical
testing and walkthrough methods in user interface evaluation. In Proceedings
of CHI '92 (Monterey, CA) ACM, New York. Pp. 397-404.
John, B. (1994). Toward a deeper comparison of Methods: A reaction to
Nielson& Phillips and New Data. CHI'94 Companion.
(Boston, Massachusetts), ACM, New York, pp. 285-286.
Law, L. and Hvannberg, E. T. (2002). Complementarity and Convergence
of Heuristic Evaluation and Usability Test: A case Study of UNIVERSAL
Brokerage Platform. NordiCHI (October 19-23,
2002), pp. 71-80.
Muller, M.J., Dayton, T., and Root, R. W. (1993). Comparing studies that
compare usability methods: An unsuccessful search for stable criteria.
INTERCHI '93 Adjunct Proceedings, pp. 185-186.
Rasmussen, J. (1986). Information Processing and Human Machine Interaction:
An approach to Cognitive Engineering. (New York: Elsevier.)
Savage, P. (1996). User Interface Evaluation in an Iterative Design Process:
A comparison of three techniques. CHI'96 Companion.
(Vancouver, BC, Canada April 13-18)
|
| |
|
|
Alan Wexelblat
|
I have to say that I find it totally unsurprising that the quality of
the evaluator is the most important factor in determining the effectiveness
of usability evaluation, regardless of the method.
We've known for thirty years in software development that quality of
personnel was (by at least a factor of 4) the most important thing affecting
the quality of the resulting product. Things like software development
methods and tools are known to have roughly a 10-20% impact on metrics
such as speed of delivery or errors in final product. It therefore would
be unsurprising to see a similar result in the HCI field.
|
|
|
|
Past Issues
|
|