Consider this scenario: You are managing the Intranet applications for a large company. You've spent the last year championing data-driven (re-)design approaches with some success. Now there is an opportunity to revamp a widely used application with significant room for improvement. You need to do the whole project on a limited dollar and time budget. It's critical that the method you choose models a user-centered approach that prioritizes the fixes in a systematic and repeatable way. It is also critical that the approach you choose be cost-effective and convincing. What do you do?
Independent of the method you pick, your tasks are essentially to:
In this situation, most people think of usability testing and heuristic (or expert) review.
In usability testing, a set of representative users are asked (individually) to complete a series of critical and/or typical tasks using the interface. During the testing, usability specialists observe the self-evidency of the task process. That is, they watch as participants work through preselected scenarios and note when participants stumble, make mistakes, and give up. The usability impact is prioritized objectively according to how frequently a given difficulty is observed across the participants. Prioritizing cost-benefits of fixes and suggesting specific design improvements for that specific interface are typically based on the practitioner's broader experience with usability inspection/testing experience.
As the name implies, heuristic review offers a shortcut method for usability evaluations. Here, one or more specialists examine the application to identify potential usability problems. Problems are noted when the system characteristics violate constraints known to influence usability. The constraints, which vary from practitioner to practitioner, are typically based on principles reviewed in usability tomes (e.g., Moglich and Nielsen, 1990) and/or criteria derived from and validated by human factors research (e.g., Gerhard-Powals, 1996). The critical steps of prioritizing the severity of the problems identified and suggesting specific remediation approaches for that interface are based on the practitioner's broader experience with usability inspection/testing experience.
Empirical evaluations of the relative merit of these approaches outline both strengths and drawbacks for each. Usability testing is touted as optimal methodology because the results are derived directly from the experiences of representative users. For example, Nielsen and Phillips (1993) report that despite its greater absolute cost, user testing 'provided better performance estimates' of interface effectiveness. The tradeoff is that coordination, testing, and data reduction adds time to the process and increases the overall man- and time-cost of usability testing. As such, proponents of heuristic review plug its speed of turnaround and cost-effectiveness. For example, Jeffries, Miller, Wharton, and Uyeda, (1991) report a 12:1 superiority for an expert inspection method over usability testing based on a strict cost-benefit analysis. On the downside, there is broad concern that the heuristic criteria do not focus the evaluators on the right problems (Bailey, Allan and Raiello, 1992). That is, simply evaluating an interface against a set of heuristics generates a long list of false alarm problems. But it doesn't effectively highlight the real problems that undermine the user experience.
There are many, many more studies that have explored this question. Overall, the findings of studies pitting usability testing against expert review, lead to the same ambivalent (lack of) conclusions.
In an attempt to find alternative approaches to compare usability testing and expert review, Muller, Dayton and Root (1993) reanalyzed the findings from four studies (Desurvire, Condziela, and Atwood, 1992; Jeffries and Desurvire, 1992; Jeffries, Miller Wharton and Uyeda, 1991; and Karat, Campbell and Fiegel, 1992). Rather than looking at the raw number of problems identified by each technique, their re-analysis categorized the findings of the previous studies on parameters such as:
Again, their re-analysis demonstrated no stable difference indicating that either usability testing or heuristic review (conducted by human factors professionals) is a superior technique.
An array of methodological inconsistencies makes interpreting the findings in toto even more challenging. The specific types of interfaces or tasks used in comparison vary widely from study to study. Many studies do not clearly articulate what the "experts" are expected to do to come up with their findings (much less, what they really DO). The specific heuristics applied are rarely specified clearly. The level of expertise of the evaluators is rarely described or clearly equated, although it is often offered informally as a factor in the diversity of outcomes. As such, it is possible that, among other things, the conclusions of specific studies falsely favor a method, when relative benefits really result from the broader and deeper experience of the individual implementing the method (Johns, 1994).
Equivocal research findings aren't really helpful when your task is to select the most cost-effective means of identifying, prioritizing and fixing problems in your interface. To make it worse, it appears that the problems identified by the two approaches are largely non-overlapping. According to Law and Hvannberg (2002), the problems identified by usability testing tend to reflect flaws in the design of the task scenarios, such as task flows that do not reflect the steps/order that users expect. In contrast, expert reviews highlight problems intrinsic to the features of the system itself, such as design consistency. Actually that's the good news.
Findings from recent studies extend those findings and may potentially help identify the parameters for identifying the right solution. Instead of pitting one strategy against the other, this study focuses on identifying the qualitative differences between the findings of usability testing and expert review.
Rasmussen (1986) identified three levels of behavior that could lead to interface challenges: skill-based, rule-based, and knowledge-based behavior.
Success at the skill-based level depends on the users' ability to recognize and pay attention to the right signals presented by the interface. Skill-based accuracy can be undermined, for example, when non-critical elements of an interface flash or move. These attention grabbing elements may pull a user's focus from the task at hand, causing errors.
Success at the rule-based level depends on the users' ability to perceive and respond to the signs that are associated with the ongoing procedure. Users stumble when they fail to complete a step or steps because the system (waiting) state or next-step information was not noticeable or clearly presented.
Success at the knowledge-based level depends on the users' ability to develop the right mental model for the system. Users acting based on an incorrect or incomplete mental model may initiate the wrong action resulting in an error.
Fu, Salvendy and Turley (2002) contrasted usability testing and heuristic review by evaluating their effectiveness at identifying interface problems at Rasmussen's three levels of processing. In their study, evaluators were assigned to evaluate an interface via either observing usability testing or heuristic review starting with an identical set of scenario-based tasks. Across the study, 39 distinct usability problems were identified. Consistent with previous research, heuristic evaluation techniques identified slightly more of the problems (87%) and usability testing slightly fewer (54%). There was a 41% overlap in the problems identified.
More interestingly, when the problems/errors were categorized on Rasmussen's behavior-levels, heuristic review and usability testing identified complimentary sets of problems: heuristic review identified significantly more skill-based and rule-based problems, whereas usability testing identified challenges that occur at the knowledge-based level of behavior.
Upon consideration, this distribution is not terribly surprising. Usability Testing identifies significantly more problems at the knowledge-level. Knowledge-based challenges arise when users are learning – creating or modifying their mental models – during the course of the task itself. Often the problems that surface here are the result of a mismatch between the expected and actual task flow. Since the mental interaction models for experts are usually fairly articulated, experts are not good at conjuring the experiences or speculating about expectations of novice users.
In contrast, skill- and rule-based levels of behavior are well studied and documented in attention and perception literature (e.g., Proctor and Van Zandt, 1994). Human Factors courses often focus on these theories. The criteria for heuristic review are essentially derived from them. It is not surprising that usability specialists focusing on heuristic criterion for evaluation would identify relatively more problems at this level. Not incidentally, parameters of interface design affecting skill- and rule-based levels of behavior reflect characteristics that are intrinsic to the interface itself. Intrinsic problems are more likely challenge advanced users because they present a fundamental challenge to their experience-based "standard model" of interface interactions.
Fu, Salvendy and Turley (2002) conclude that the most effective approach is to integrate both techniques at different times in the design process. Based on their analysis, heuristic review should be applied first or early in the (re-)design process to identify problems associated with the lower levels of perception and performance. Once those problems are resolved, usability testing can be used to effectively focus on higher level interaction issues without the distraction of problems associated with skill- and rule-based levels of performance.
Mindful of budget and time constraints, Fu and colleagues also note that if redesign is concerned only with icon design or layout, heuristic review may be sufficient to the task. However, if modification affects software structure, interaction mapping or complex task flows, usability testing is the better choice. To that end, the more complex the to-be-performed tasks are, the more critical representative usability testing becomes. The more complex the proposed redesign, the more critical that both methods be employed.
So what should you do with your evaluation project? Like most other projects, it depends on the specific case. Despite the chaotic nature of the field, it is still possible to draw a few conclusions from these studies:
Taken together these suggest that to select the most appropriate method you will need to consider more than your budget and the possible methodologies. To make the best decision, you will also need to weigh the expertise of your evaluators, the maturity of your application, the complexity of the tasks, and possibly even the current status of your usability program.
Bailey, R. W., Allan, R. W., and Raiello, P. (1992). Usability testing vs. heuristic evaluation: A head-to-head comparison. Proceedings of the 36th Annual Human Factors Society Meeting. pp.409-413.
Desurvire, H. Kondziela, J. and Atwood, M. (1992). What is gained and what is lost when using evaluation methods other than usability testing. Paper Presented at Human Computer Interaction Consortium.
Gerhardt-Powals, J. (1996). Cognitive engineering principles for enhancing human-computer performance. International Journal of Human-Computer Interaction, 8(2), 189-211.
Moglich, R. and Nielsen, J. (1990), Improving a human-computer dialogue. CACM, 33(3), 338-348.
Nielsen, J. (1994). Enhancing the explanatory power of usability heuristics. Proceedings of CHI'94. ACM, New York. pp 153-158.
Jeffries, R. and Desurvire, H. (1992). Usability Testing vs. Heuristic Evaluation: Was there a Contest? SIGCHI Bulletin, 24(4) 39-41.
Jeffries, R. Miller, J. Wharton, C. and Uyeda, K. (1991). User interface evaluation in the real world: A comparison of four techniques. In Proceedings of CHI'91. (New Orleans, LA) ACM, New York. Pp. 119-124.
Karat, C. M., Campbell, R. and Fiegel, T. (1992). Comparison of empirical testing and walkthrough methods in user interface evaluation. In Proceedings of CHI '92 (Monterey, CA) ACM, New York. Pp. 397-404.
John, B. (1994). Toward a deeper comparison of Methods: A reaction to Nielson& Phillips and New Data. CHI'94 Companion. (Boston, Massachusetts), ACM, New York, pp. 285-286.
Law, L. and Hvannberg, E. T. (2002). Complementarity and Convergence of Heuristic Evaluation and Usability Test: A case Study of UNIVERSAL Brokerage Platform. NordiCHI (October 19-23, 2002), pp. 71-80.
Muller, M.J., Dayton, T., and Root, R. W. (1993). Comparing studies that compare usability methods: An unsuccessful search for stable criteria. INTERCHI '93 Adjunct Proceedings, pp. 185-186.
Rasmussen, J. (1986). Information Processing and Human Machine Interaction: An approach to Cognitive Engineering. (New York: Elsevier.)
Savage, P. (1996). User Interface Evaluation in an Iterative Design Process: A comparison of three techniques. CHI'96 Companion. (Vancouver, BC, Canada April 13-18)
I have to say that I find it totally unsurprising that the quality of the evaluator is the most important factor in determining the effectiveness of usability evaluation, regardless of the method.
We've known for thirty years in software development that quality of personnel was (by at least a factor of 4) the most important thing affecting the quality of the resulting product. Things like software development methods and tools are known to have roughly a 10-20% impact on metrics such as speed of delivery or errors in final product. It therefore would be unsurprising to see a similar result in the HCI field.
Sign up to get our Newsletter delivered straight to your inbox
HFI may use “cookies” or “web beacons” to track how Users use the Website. A cookie is a piece of software that a web server can store on Users’ PCs and use to identify Users should they visit the Website again. Users may adjust their web browser software if they do not wish to accept cookies. To withdraw your consent after accepting a cookie, delete the cookie from your computer.
HFI believes that every User should know how it utilizes the information collected from Users. The Website is not directed at children under 13 years of age, and HFI does not knowingly collect personally identifiable information from children under 13 years of age online. Please note that the Website may contain links to other websites. These linked sites may not be operated or controlled by HFI. HFI is not responsible for the privacy practices of these or any other websites, and you access these websites entirely at your own risk. HFI recommends that you review the privacy practices of any other websites that you choose to visit.
HFI is based, and this website is hosted, in the United States of America. If User is from the European Union or other regions of the world with laws governing data collection and use that may differ from U.S. law and User is registering an account on the Website, visiting the Website, purchasing products or services from HFI or the Website, or otherwise using the Website, please note that any personally identifiable information that User provides to HFI will be transferred to the United States. Any such personally identifiable information provided will be processed and stored in the United States by HFI or a service provider acting on its behalf. By providing your personally identifiable information, User hereby specifically and expressly consents to such transfer and processing and the uses and disclosures set forth herein.
In the course of its business, HFI may perform expert reviews, usability testing, and other consulting work where personal privacy is a concern. HFI believes in the importance of protecting personal information, and may use measures to provide this protection, including, but not limited to, using consent forms for participants or “dummy” test data.
HFI may use personally identifiable information collected through the Website for the specific purposes for which the information was collected, to process purchases and sales of products or services offered via the Website if any, to contact Users regarding products and services offered by HFI, its parent, subsidiary and other related companies in order to otherwise to enhance Users’ experience with HFI. HFI may also use information collected through the Website for research regarding the effectiveness of the Website and the business planning, marketing, advertising and sales efforts of HFI. HFI does not sell any User information under any circumstances.
HFI may disclose personally identifiable information collected from Users to its parent, subsidiary and other related companies to use the information for the purposes outlined above, as necessary to provide the services offered by HFI and to provide the Website itself, and for the specific purposes for which the information was collected. HFI may disclose personally identifiable information at the request of law enforcement or governmental agencies or in response to subpoenas, court orders or other legal process, to establish, protect or exercise HFI’s legal or other rights or to defend against a legal claim or as otherwise required or allowed by law. HFI may disclose personally identifiable information in order to protect the rights, property or safety of a User or any other person. HFI may disclose personally identifiable information to investigate or prevent a violation by User of any contractual or other relationship with HFI or the perpetration of any illegal or harmful activity. HFI may also disclose aggregate, anonymous data based on information collected from Users to investors and potential partners. Finally, HFI may disclose or transfer personally identifiable information collected from Users in connection with or in contemplation of a sale of its assets or business or a merger, consolidation or other reorganization of its business.
If a User includes such User’s personally identifiable information as part of the User posting to the Website, such information may be made available to any parties using the Website. HFI does not edit or otherwise remove such information from User information before it is posted on the Website. If a User does not wish to have such User’s personally identifiable information made available in this manner, such User must remove any such information before posting. HFI is not liable for any damages caused or incurred due to personally identifiable information made available in the foregoing manners. For example, a User posts on an HFI-administered forum would be considered Personal Information as provided by User and subject to the terms of this section.
Information about Users that is maintained on HFI’s systems or those of its service providers is protected using industry standard security measures. However, no security measures are perfect or impenetrable, and HFI cannot guarantee that the information submitted to, maintained on or transmitted from its systems will be completely secure. HFI is not responsible for the circumvention of any privacy settings or security measures relating to the Website by any Users or third parties.
Human Factors International, Inc.
PO Box 2020
410 W Lowe Ave
Fairfield IA 52556