Usability testing is a tried and true methodology in our industry. Periodically it comes under fire from within and outside the usability community, but has always stood the test of time. Although there is some variation in usability testing protocols from tester to tester, or firm to firm, the basic concepts – think-aloud techniques during the test, usability engineer observing, logging data, interpreting user actions – remain the same.
Is this all about to change? Recent research on usability techniques is yielding some interesting results, and may point us in different directions in the future. Do we stay the course of tradition, or do we embrace growth and change?
(Please note that the comments in this newsletter are about actual research, not just trying out different methodologies.)
For several years there has been debate in the usability community about automated testing vs. having a usability engineer present and running the test. West and Lehman conducted a study in which they compared automated testing with traditional usability engineer-led testing. Are the results and the data generated by automated testing the same as if a usability engineer ran the test?
Here's what they found: There were a few differences in the data coming from users in the automated vs. traditional testing. For example, the task times were longer in the automated test because in the automated condition the reading of the task was included in the overall task time, whereas the usability engineer didn't start the clock in the traditional test until after the participant read the instruction for the task. But much of the data was the same for the automated test and the in-person test, both quantitatively as well as qualitatively. Failure rates were very similar, and both methods elicited plenty of participant comments.
Before you get too excited or too upset (depending on whether you are a fan of automated testing or not), there is one key difference that the researchers didn't think was very important, but I disagree. With the in-person usability engineer, the usability expert found on average 13 additional usability problems that were not identified in the automated condition.
You can get valid information from automated testing. You can use it for major benchmarking measures, but don’t expect to find all the critical usability issues.
One vote for tradition.
For many years lo-fidelity prototypes – paper sketches, for example – were considered the preferred alternatives for testing since they were easy and fast to create. It was believed that users would not assume you were "done with design" and would therefore be more likely to give feedback. Recently high fidelity prototypes have taken over, as they allow more realistic depictions of today's complicated, colorful and richly interactive screens.
In a research study by McCurdy et al, the authors argue that we have the question and answers wrong. What should you be using? – "Mixed fidelity" prototypes. Characterizing prototypes as low fidelity or high fidelity doesn't capture the possible ranges of differences one can have. The authors suggest that it is more useful to use 5 dimensions:
You decide, based on the purpose of a particular usability test, whether to use low, medium or high amounts for each dimension. Their study suggests that carefully choosing from the dimensions results in data that is closer to "real" performance data, yet you have the advantages of a lower fidelity test (easier to create and change than a final product).
One vote for change and growth.
Although some usability tests involve testing multiple designs, most test one design and look for usability problems/issues in the one design which will then be iterated. Is there an advantage to testing alternative designs all at the same time?
Tohidi, et al studied whether the quantity and type of comments you would receive during a usability test would change if you showed more than one design. If you show three alternative designs, for example, do you get different feedback, or more feedback, than if you tested one?
The data they collected contained interesting results and implications. When only one prototype was shown, it had higher ratings and more positive comments. People were being "nicer" about evaluating the single design. When users saw three alternative designs during the same test, then they gave more critical feedback. They weren't so "nice." The authors refer to previous analysis by Wiklund postulating that when participants view more than one prototype it sends a clear message that the designers have not yet made up their mind as to which design to use. Since a commitment hasn't been made, the researchers are seen as being more neutral, and thus the participant doesn't have to worry as much about disappointing the researcher with a negative reaction. This in turn allows the participant to be more critical.
Interestingly, in this study the researchers had a hypothesis they were testing that showing users multiple design solutions would help the users engage in participatory design. This proved not to be true. In both the one-design condition as well as the three-design condition, users did not come up with redesign suggestions. (Well, we know that users are not usually designers... this finding is not surprising).
A small but interesting finding in this study was that participants who reviewed only one design made comments, but did not totally "reject" the design. However, some of the participants in the multiple design condition did reject the entire design, saying things such as, "I would not buy this one."
One vote for change and growth.
One of the hallmarks of an in-person usability test is the think-aloud technique. Can you imagine a usability test in which the user is not thinking aloud? Well, think again. In a study by Guan et al, the researchers challenge our assumptions. They look at a technique called Retrospective Think Aloud (RTA). The usual usability testing protocol is Current Think Aloud (CTA). There has been some criticism that CTA does not simulate normal tasks. In "real" life, users are not annotating each action with thinking aloud while they are doing tasks. Lately there has been some interest in using RTA instead of CTA. With RTA, users do the tasks silently, and then talk about what they did afterwards. In this study they compared RTA with eye tracking data to determine the validity of the RTA technique.
They found that people's recounting of what went on in their task performance matched the same sequence as what they attended to according to the eye tracking data. And it didn't matter whether the task was simple or complex.
However, they also found that the participants left out a lot of information. The sequence of what they said they did and why they did it matched the sequence in eye tracking, but there was a lot of information omitted. The researchers attribute this to the fact that the participants are summarizing their actions, but I wonder if this may actually be hinting at a new frontier of usability testing instead – see Question #5.
One vote for traditional.
Both CTA and RTA assume that having users monitor their own actions and reactions results in valid data. But in a fascinating book called Strangers to Ourselves: Discovering the Adaptive Unconscious, Timothy Wilson reviews theories and research indicating that the vast majority of our actions and decisions are made from non-conscious processes. In other words, although we will prattle on about why we do what we do, the real reasons are not available to our conscious minds. It's a compelling argument, with real data to back it up. So what does this mean for usability testing and the think-aloud technique? I'm still working on this one... I'm hoping someone will devise a galvanic skin response mouse so that we can measure changes in bodily functions rather than relying on meta-cognition.
One vote for growth and change.
So what's the final tally? Two votes for tradition, and three for growth and change... Hang on, it might be a bumpy ride!
Guan, Z., Lee, S., Cuddihy, E., Ramey, J. (2006). The Validity of the Stimulated Retrospective Think-Aloud Method as Measured by Eye Tracking, CHI 2006 Proceedings.
McCurdy, M., Connors, C., Pyrzak, G., Kanefsky, B., and Vera, A. (2006). Breaking the Fidelity Barrier, CHI 2006 Proceedings.
Tohidi, M., Buxton, W., Baecker, R., and Sellen, A. (2006). Getting the Right Design and the Design Right: Testing Many Is Better Than One, CHI 2006 Proceedings.
West, R. and Lehman, K. (2006). Automated Summative Usability Studies: An Empirical Evaluation, CHI 2006 Proceedings.
Wiklund, M., Thurott, C., and Dumas, J. (1992). Does the Fidelity of Software Prototypes Affect the Perception of Usability? Proceedings Human Factors Society 36th Annual Meeting, 399-403.
Wilson, T. (2004). Strangers to Ourselves: Discovering the Adaptive Unconscious, Belknap Press; New Ed edition.
I thought Susan’s article was fantastic: a concise, timely and relevant summary of academic research that may influence future directions in our field.
I wanted to also point out that in market research in Australia, at least, they are starting to use neuroscience to gauge reactions to things (e.g. advertisements) in much the way that Susan alludes to in her response to Question 5. See this link, for example.
I have concerns about the usefulness of such data and the quality of the resulting analysis (much as I do with eye tracking) but I thought you might be interested nonetheless.
Finally, thank you for a consistently great read.
One thought on the generalizability of the West and Lehman paper. They used SAS employees from marketing and UI design groups. I realize they did this for expediency's sake. However, I wonder whether less techy, communications oriented people would provide as good a qualitative feedback in their typed in automated test comments. I was little suspicious when I saw a few of the verbatims seemed more descriptive than what I might expect test participants from the public at large might say. I might not expect references such as "easy to find the dialog" and "in order to commit the changes" from the public at large. Also, at least in the examples given, some were very descriptive task focused - almost what I'd expect a usability engineer to do when describing a problem. It'd be nice to see the study done with a broader recruiting profile.
People looking for another perspective on "QUESTION #4: USABILITY TESTING = THE THINK-ALOUD TECHNIQUE?" should read relevant sections of "Blink" by Malcolm Gladwell.
Sign up to get our Newsletter delivered straight to your inbox
HFI may use “cookies” or “web beacons” to track how Users use the Website. A cookie is a piece of software that a web server can store on Users’ PCs and use to identify Users should they visit the Website again. Users may adjust their web browser software if they do not wish to accept cookies. To withdraw your consent after accepting a cookie, delete the cookie from your computer.
HFI believes that every User should know how it utilizes the information collected from Users. The Website is not directed at children under 13 years of age, and HFI does not knowingly collect personally identifiable information from children under 13 years of age online. Please note that the Website may contain links to other websites. These linked sites may not be operated or controlled by HFI. HFI is not responsible for the privacy practices of these or any other websites, and you access these websites entirely at your own risk. HFI recommends that you review the privacy practices of any other websites that you choose to visit.
HFI is based, and this website is hosted, in the United States of America. If User is from the European Union or other regions of the world with laws governing data collection and use that may differ from U.S. law and User is registering an account on the Website, visiting the Website, purchasing products or services from HFI or the Website, or otherwise using the Website, please note that any personally identifiable information that User provides to HFI will be transferred to the United States. Any such personally identifiable information provided will be processed and stored in the United States by HFI or a service provider acting on its behalf. By providing your personally identifiable information, User hereby specifically and expressly consents to such transfer and processing and the uses and disclosures set forth herein.
In the course of its business, HFI may perform expert reviews, usability testing, and other consulting work where personal privacy is a concern. HFI believes in the importance of protecting personal information, and may use measures to provide this protection, including, but not limited to, using consent forms for participants or “dummy” test data.
HFI may use personally identifiable information collected through the Website for the specific purposes for which the information was collected, to process purchases and sales of products or services offered via the Website if any, to contact Users regarding products and services offered by HFI, its parent, subsidiary and other related companies in order to otherwise to enhance Users’ experience with HFI. HFI may also use information collected through the Website for research regarding the effectiveness of the Website and the business planning, marketing, advertising and sales efforts of HFI. HFI does not sell any User information under any circumstances.
HFI may disclose personally identifiable information collected from Users to its parent, subsidiary and other related companies to use the information for the purposes outlined above, as necessary to provide the services offered by HFI and to provide the Website itself, and for the specific purposes for which the information was collected. HFI may disclose personally identifiable information at the request of law enforcement or governmental agencies or in response to subpoenas, court orders or other legal process, to establish, protect or exercise HFI’s legal or other rights or to defend against a legal claim or as otherwise required or allowed by law. HFI may disclose personally identifiable information in order to protect the rights, property or safety of a User or any other person. HFI may disclose personally identifiable information to investigate or prevent a violation by User of any contractual or other relationship with HFI or the perpetration of any illegal or harmful activity. HFI may also disclose aggregate, anonymous data based on information collected from Users to investors and potential partners. Finally, HFI may disclose or transfer personally identifiable information collected from Users in connection with or in contemplation of a sale of its assets or business or a merger, consolidation or other reorganization of its business.
If a User includes such User’s personally identifiable information as part of the User posting to the Website, such information may be made available to any parties using the Website. HFI does not edit or otherwise remove such information from User information before it is posted on the Website. If a User does not wish to have such User’s personally identifiable information made available in this manner, such User must remove any such information before posting. HFI is not liable for any damages caused or incurred due to personally identifiable information made available in the foregoing manners. For example, a User posts on an HFI-administered forum would be considered Personal Information as provided by User and subject to the terms of this section.
Information about Users that is maintained on HFI’s systems or those of its service providers is protected using industry standard security measures. However, no security measures are perfect or impenetrable, and HFI cannot guarantee that the information submitted to, maintained on or transmitted from its systems will be completely secure. HFI is not responsible for the circumvention of any privacy settings or security measures relating to the Website by any Users or third parties.
Human Factors International, Inc.
PO Box 2020
410 W Lowe Ave
Fairfield IA 52556