The hallmark of modern particle physics centers on the "uncertainty principle". Namely, you can know either the position of a so-called wave-particle or its momentum, but not both. The reason for this conundrum is that the very act of observation changes "reality" from probability (the wave-particle might be here) to actuality (hah, gotcha pinned down).
Could the same hold true for usability testing?
As Bob Dylan once asked, "what is real"? Have you wondered about the effect of "thinking out loud" and whether it causes your test subject to be more attentive than they otherwise might be? Or perhaps the think-aloud effort competes for attention and distracts the subject from engaging fully in their task? As a clue to how this works, see if you can remember any moments when your test subject started to fade out... they squished up their brows, hunched over the keyboard, stared intensely at the monitor, and simply faded out... no talk, no anything. Of course, your standard procedure is to say "tell me what you're thinking," or "tell me what you're looking at." (Your measure of professional attainment is the variety of ways in which you can ask these questions without sounding like a parrot-savant.) But are you actually derailing an intense thought process with these innocent reminders? Have you occasionally just let your subject ruminate, get lost in airy spaces of problem solving, hoping they would soon come back to terra firma and report their findings to you? But then, can they remember the exquisite and lengthy chain of logic that lead them to the "wrong choice"?
Like Aikido, Kung Fu, maybe even kick boxing, the art of winning often resolves down to the art of leverage... using our opponent's momentum and making it go where you want it to go rather than landing on your soft body. Definitely, this is the art of altering wave-particle probabilities towards your favor.
In usability testing we have some choices, too. How can we reduce the probabilities of getting hit with bad data? Can we shape the trajectory of the testing event to allow subject insights we know can improve our design?
Van Den Haak and colleagues (2003) investigated the effect of thinking out loud and compared it to an alternative approach called "retrospective" reporting. They compared two approaches to the same usability testing scenarios – university students trying to use an online library catalog.
Subjects were 2rd and 3th year Dutch students majoring in the same subject and with some knowledge of online catalogs.
Twenty of these students completed 7 tasks using the normal "concurrent" think aloud method we all use, and twenty others did the same tasks with no verbal comments during the task. However, the latter did post-test commentaries during their video review of their performance.
The outcomes fell into two categories: problems that the facilitator observed and problems that the subject verbalized.
In the concurrent, think-aloud method, the facilitator observed more problems happening than during the "retrospective" test where the subject did not talk during the task.
So, why were there more problems for the concurrent, think-aloud method? Did the test subject pay more attention to their task and thus recognize more problems? Or did the extra effort of talking during the test "cause" more problems? More on this later.
When considering verbalization of problems, the advantage switched to the retrospective method. While watching their videos, the retrospective subjects offered many more comments than the concurrent subjects.
This implies that the concurrent subjects may not have said what they really felt – they failed to report all their observations. Or, as suggested by the researchers, the retrospective subjects were able to report additional problems that were not related to the observed problems.
In both cases, the researchers indicate that the combined observed and verbalized problems came out the same. Thus, they concluded the two methods of testing were more or less equally sensitive to problems. Consequently, they suggest, you could use either test method, depending on how easily subjects can verbalize during the test.
If subjects have difficulty speaking during the task (due to heavy mental work-out), then the retrospective method is fine. Otherwise, use the concurrent method because it takes only half the time (you don't need to review the video with the subject).
However, the researchers asked if there was still some evidence that the extra cognitive work of concurrently thinking out loud during the test influenced the overall outcome. Indeed, they calculated that the concurrent subjects completed only 2.6 tasks successfully, whereas the retrospective subjects completed 3.3 tasks successfully.
Although this difference was not statistically significant, it was close enough to suggest that the extra workload of thinking out loud could influence the results.
Again, we are left with wave-particle duality and a probabilistic interpretation of the results. Well, if you must come to a concrete solution, the statistics say there was no real difference. Just a probability of a difference. You get to decide...
A different research group investigated the influence of an unmoderated "remote testing environment" compared to the typical "laboratory" environment.
Schulte-Mecklenbeck and Huber (2003) asked 40 German university students to complete certain Web-based tasks in a typical lab setting. Meanwhile, they had 32 similar students do the same at their homes, using an automated, unmoderated, remote testing paradigm.
Interestingly, the students in the lab completed the tasks using about twice as many clicks and twice as much time compared to the students at home. Whew!
Was the actual task twice as hard in the lab? Or was the perceived pressure twice as much in the lab?
Since the task was identical in both cases, the authors conclude that subjects in the lab felt more pressure to perform well, and thus made more effort to do well. The authors suggest that they perceived the facilitator as an "authority figure". (How much authority do you command in your lab? Any at all?)
Also, subjects may have perceived the lab setting as "more important" than using the Web at home. (Well, home is for relaxing...but wait, maybe that's where your Web site is most often used?)
In both cases, we see that the observer has influenced the results. In this case the automated test, with subjects out of view of an overseer, appears to have produced more honest results if that is the environment used in the real-life version of the tasks.
These results differ from prior research comparing Web vs. lab. Prior research suggests that Web and laboratory behaviors are about the same. For example, results from a psychological test taken on the Web were comparable to test norms obtained through pencil and paper. In other cases, subjects responded to line drawings and photographs presented on the Web with results similar to face-to-face responses. Even risk-seeking in a lottery setting was similar between Web and laboratory settings.
What was different about this particular test? Why should this laboratory experience generate more diligent responses?
The authors speculate that the nature of the task lends itself to a greater range of responsiveness among subjects. The tasks in this study were open-ended, information-seeking tasks. That is, participants were not seeking a "correct answer." Rather, they sought to find enough information to make a decision.
Thus, participants decided for themselves when to stop. They decided when they had found enough information to make a decision.
Does this sound like the tasks your Web site supports? Probably so, if you work on one of the many large-scale information sites on the Internet – or even as found on Intranets.
In any event, we just saw an example where the act of observation definitely influenced the outcome. The uncertainty principle operates in usability testing, just as in testing the behavior of sub-atomic particles.
Just to give us a sense of the normal reality found in classical physics, let's take a look at another comparison study.
Thomas Tullis, a well-known usability author and researcher, worked with his colleagues (2002) to check out whether automated, unmoderated, remote usability testing gives similar results as laboratory testing. Subjects were employees of a US corporation.
In contrast to the study reported above, Tullis used "closed-ended" tasks. That is, the results were either right or wrong. Does that sound like some of your testing outcomes?
If so, you can have some assurance that the laboratory setting doesn't upset your results. Tullis found that for the 13 tasks they used, subjects in the lab gave similar results as subjects in the unmoderated, remote testing environments. His team found similar task success rates and similar task times. No authority effect, here.
Actually, Tullis and team were more interested in whether the unmoderated remote testing was as effective for finding problems as the lab environment. They found that remote testing worked well and had benefits that complemented the lab environment. (Do you recall the benefits of testing more subjects? See our May, 2004 newsletter.)
Aha! More subjects can be better, they found. Whereas the lab setting found 9 issues, the remote test found 17 issues. Well, what do we expect if the lab only has 8 subjects and the remote test has 88 subjects?
Interestingly, the law of diminishing returns did not penalize the lab environment unduly. Tullis and crew felt that both the lab and remote environments discovered the three major problems (overloaded home page, general terminology problems, and unclear navigation wording).
Plus, seven out of the nine problems found in the lab were also found in the remote test.
However, more subjects can be better, as we said. And that was the benefit of the remote test. After all, we test to find problems, don't we?
What other influences of the remote testing environment appear valuable, aside from the absence of the observer?
Tullis and group found they got greater diversity of user types in task experiences, computer experiences, and individual characteristics. They also got more hardware variety, such as screen resolution. And they were pleasantly surprised by the completeness and insights of the typed responses.
But the lab offered value, too. For example, the remote test revealed usage of 1024 screen resolution among nearly all subjects and revealed a problem with small fonts. The lab setting forced usage of 800 resolution and resulted in detection of excessive scroll requirements. The lab revealed that certain navigation options were overlooked, although the remote results showed most subjects found the options anyway over time.
Tullis and group recommend a combination of both remote and lab testing to cover the range of issues.
So, now we know the observer influences the observation, just like real-life physics.
But certainly, the concurrent, think-aloud method gave more observed problems than the retrospective reporting. And it gave fewer verbalized problems. However, the combined amount of observed and verbalized problems was equivalent between the two environments. Thus, if we have complex tasks that make it hard for subjects to talk aloud when doing tasks, then feel comfortable showing them their video. They can talk plenty during the video.
Amidst these suggestions, we do find some guidelines – albeit, the findings may be qualified by the nationality of the subjects, or their student status, or any other of many differences compared with your target population.
But that's life.
All we can say, like physicists do, is take a chance. And make it work.
Janni Nielsen of the Copenhagen Business School presented a paper at the India HCI conference today (12/6/2004). In her paper she reports an interesting hybrid approach. She records the task completion without thinking aloud. In this recording she has screen, interaction (including mouse movement), and facial expression. Her participants are trained to move their mouse to the areas of the screen they are paying attention to.
After task completion she shows the recording to participants and has them describe their mental process. She reports that participants have a "mental tape" of the session and can provide substantial insight based on prompting with their recorded interaction. She then tapes the user's interpretation along with the interaction.
Janni Nielsen (2004). Reflections on Concurrent and Retrospective User Testing, Session G2, New Directions in HCI, IHCI 2004, Dec 06-07, Bangalore, India
Tullis, T., Fleischman, S., McNulty, M., Cianchette, C., Bergel, M. (2002). An Empirical Comparison of Lab and Remote Usability Testing of Web Sites, Usability Professionals Association Conference, Orlando, Florida.
Schulte-Mecklenbeck, M. and Huber, O. (2003). Information Search in the Laboratory and on the Web: With or without an Experimenter. Behavior Research & Methods, Instruments & Computers.
Van Den Haak, M.J., De Jong, M.D.T., Schellens, P.J. (2003). Retrospective Versus Concurrent Think-Aloud Protocols: Testing the Usability of an Online Library Catalog. Behavior & Information Technology, 22 (5), 339–351.
Enjoy your newsletter, as always.
A comment concerning the use of the retrospective think-aloud method, with video or without. I find myself a bit concerned about using this method. Has it been tested several times to:
Don't know how much these factors impact this method, but the potential threats are at least something that I would stick in as a caveat in a test report.
Otherwise, this article is such a keeper and should help as we decide what sort of testing to do on future redesigns.
Excellent information on observer effects that should be followed.
However, one brief point: let's remember that the folks who assist us in our research efforts are to be referred to as "participants" and not as "subjects". The APA publication manual (fifth edition, 2001) refers to subjects in its grammatical essence (i.e., subject-verb agreement) and "participants" as humans.
Dr. Sorflaten used "subjects" 40 times in the newsletter. Kudos to Janni Nelson and Dr. Schaffer for their adherence to APA guidelines!
Further, as I sat in my cube this morning, I overheard a self-proclaimed expert in research methods and statistics remark that the "subjects" in their study... Need I say more?
Sign up to get our Newsletter delivered straight to your inbox
HFI may use “cookies” or “web beacons” to track how Users use the Website. A cookie is a piece of software that a web server can store on Users’ PCs and use to identify Users should they visit the Website again. Users may adjust their web browser software if they do not wish to accept cookies. To withdraw your consent after accepting a cookie, delete the cookie from your computer.
HFI believes that every User should know how it utilizes the information collected from Users. The Website is not directed at children under 13 years of age, and HFI does not knowingly collect personally identifiable information from children under 13 years of age online. Please note that the Website may contain links to other websites. These linked sites may not be operated or controlled by HFI. HFI is not responsible for the privacy practices of these or any other websites, and you access these websites entirely at your own risk. HFI recommends that you review the privacy practices of any other websites that you choose to visit.
HFI is based, and this website is hosted, in the United States of America. If User is from the European Union or other regions of the world with laws governing data collection and use that may differ from U.S. law and User is registering an account on the Website, visiting the Website, purchasing products or services from HFI or the Website, or otherwise using the Website, please note that any personally identifiable information that User provides to HFI will be transferred to the United States. Any such personally identifiable information provided will be processed and stored in the United States by HFI or a service provider acting on its behalf. By providing your personally identifiable information, User hereby specifically and expressly consents to such transfer and processing and the uses and disclosures set forth herein.
In the course of its business, HFI may perform expert reviews, usability testing, and other consulting work where personal privacy is a concern. HFI believes in the importance of protecting personal information, and may use measures to provide this protection, including, but not limited to, using consent forms for participants or “dummy” test data.
HFI may use personally identifiable information collected through the Website for the specific purposes for which the information was collected, to process purchases and sales of products or services offered via the Website if any, to contact Users regarding products and services offered by HFI, its parent, subsidiary and other related companies in order to otherwise to enhance Users’ experience with HFI. HFI may also use information collected through the Website for research regarding the effectiveness of the Website and the business planning, marketing, advertising and sales efforts of HFI. HFI does not sell any User information under any circumstances.
HFI may disclose personally identifiable information collected from Users to its parent, subsidiary and other related companies to use the information for the purposes outlined above, as necessary to provide the services offered by HFI and to provide the Website itself, and for the specific purposes for which the information was collected. HFI may disclose personally identifiable information at the request of law enforcement or governmental agencies or in response to subpoenas, court orders or other legal process, to establish, protect or exercise HFI’s legal or other rights or to defend against a legal claim or as otherwise required or allowed by law. HFI may disclose personally identifiable information in order to protect the rights, property or safety of a User or any other person. HFI may disclose personally identifiable information to investigate or prevent a violation by User of any contractual or other relationship with HFI or the perpetration of any illegal or harmful activity. HFI may also disclose aggregate, anonymous data based on information collected from Users to investors and potential partners. Finally, HFI may disclose or transfer personally identifiable information collected from Users in connection with or in contemplation of a sale of its assets or business or a merger, consolidation or other reorganization of its business.
If a User includes such User’s personally identifiable information as part of the User posting to the Website, such information may be made available to any parties using the Website. HFI does not edit or otherwise remove such information from User information before it is posted on the Website. If a User does not wish to have such User’s personally identifiable information made available in this manner, such User must remove any such information before posting. HFI is not liable for any damages caused or incurred due to personally identifiable information made available in the foregoing manners. For example, a User posts on an HFI-administered forum would be considered Personal Information as provided by User and subject to the terms of this section.
Information about Users that is maintained on HFI’s systems or those of its service providers is protected using industry standard security measures. However, no security measures are perfect or impenetrable, and HFI cannot guarantee that the information submitted to, maintained on or transmitted from its systems will be completely secure. HFI is not responsible for the circumvention of any privacy settings or security measures relating to the Website by any Users or third parties.
Human Factors International, Inc.
PO Box 2020
410 W Lowe Ave
Fairfield IA 52556