Introduction

It's time for the world to be usable. People are ready.

Users are no longer passively frustrated when things don't work. They regularly suggest improvements. They use the words like "usable" and "citizen-friendly" and even "task flow." They don't just crib about time lost to inefficient products anymore. They do the math. But not on their cell phones.

Today we know that consumers evaluate and select both products and services based on the user-friendliness of an interface.

Blink! usability matters

But it gets even better. Executives have discovered the value of usability. You hear the word "usability" in elevators all the time.

It's clear from those overheard conversations that executives who understand that usability can be a strategic differentiator don't always grok the practical details of what is involved. But that's not important. All the gurus agree (!) that the first step in making usability routine is getting the support of an executive champion.

If the executives will back it – blink! – usability matters.

Practitioners, not gurus.

User-centered design is being systematically integrated into the Web, application and product development process. It's the tipping point usability specialists have been waiting for. But are we ready? Does the field have the tools, and resources – or for that matter the people – to keep up with the need?

To keep up with the need, usability needs to do two things.

First, usability needs to transition away from the can-you-believe-it? high-cost boutiquey market that defines the industry today. If organizations are really going to adopt and embed usability in their day-to-day processes, it can't be guru-expensive.

There aren't really enough gurus to go around anyway. So to pick up the slack, the industry needs to evolve industry standards with common practices*, tools and resources that support scalability. Sure, we probably should keep our gurus, but we also need to create a legion of practitioners who can do the work. Usability needs to become a practice, not just an art.

This means that the industry needs to agree upon both what it is we are doing when we "do usability" and how we should go about doing it. User-centered methods should guide practitioners in collecting and analyzing user data to support informed design decisions. The methodologies need to be robust and replicable. Applying the same method in the same environment should yield a similar (though not necessarily identical) result.

Let's make it concrete. If usability is to scale, our understanding of what usability IS and how to do it has to be consistent enough so that different organizations asked to evaluate the same application, will return roughly the same list of challenges and recommendations. Usability is at least that evolved, right? There are variations on the methodological theme, but does the output vary that much?

None these things is just like the others...

You may be surprised. Even a task as (seemingly) transparent as usability testing Microsoft's hotmail.com elicited different data based on different approaches to usability testing.

Molich, Ede, Kaasgaard and Karyukin (2004) reports on the findings of the Comparative Usability Evaluation Study (CUE-2). This meta-analysis describes the usability testing approaches and results across nine independent usability groups asked to conduct a "standard" usability test of hotmail.com. The teams included six industry labs, two university-based teams with commercial activities and two student teams. Each team was provided the same project background information and access to a "Marketing Liaison" for further clarification or feedback on their proposed methods.

Molich and colleagues compared and contrasted the usability testing approach, usability problems discovered, and reporting of findings across the teams. Their finding is jarring: " ...our simple assumption that we are all doing the same and getting the same results in a usability test is plainly wrong" (p. 65).

The details – particularly if you think of usability testing as a process-driven task – are equally jarring:

The teams
Usability teams ranged from 1 to 7 members in size. They used from 16 to 218 hours to conduct the test.

Selection of Method
Eight out of the nine participating teams used some variation on think-aloud testing to conduct the usability review. The commonalities largely end here.

The various teams tested 6.6 participants on average, with a range from 4 to 50 across the teams. (The team testing 50 participants used a semi-structured exploration/questionnaire approach with no direct observation of users completing tasks.)

Interacting with the "client"
Only two of the nine teams solicited client input beyond the initial briefing during the usability testing project.

Developing the testing protocol
The project briefing provided to each team outlined 18 features that the Hotmail team indicated could be enhanced through user input. Five were listed as top priority.

Despite this client-based direction, the overlap in tested tasks was limited: 51 different tasks appeared on the testing protocols. Only two usability testing tasks were common across all of the teams (Register, send someone e-mail). 25 (49%) of the tasks tested were proposed by only one team.

Leading the witness...
8 of the nine testing teams used leading questions on their testing protocols. Leading questions are questions that contained hidden instructions or cues, such as "Create a personal signature" in a context where the user needs to click on a link with the word "signature" in it. Leading questions test participants' ability to recognize keywords rather than there ability to complete the task.

Usability problems uncovered
The usability teams reported from 10 to 149 problems. No single usability problem was reported by all nine testing teams. One problem was reported by 7 of the nine teams.

For the two tasks that were tested by all teams, 232 unique problems were reported. 75% of the problems identified were identified by only one of the teams.

Reporting the findings
Many violations of best practices in usability testing reports (see Dumas and Redish, 1999) were noted. Key among those were:

4 out of 9 reports failed to include an Executive Summary.
7 out of 9 reports contained two or less screen shots (3 reports had no screenshots).
Reports failed to indicate problem frequency.
3 of 9 reports failed to prioritize problems based on severit.
Reports identified too many problems to be useful. (Molich and colleagues suggest 15-60 problems to be manageable.)

Quality of findings
Two interesting findings emerge. First, student reports are not easy to distinguish from professional reports.

Second, the results of one indirect testing team differed from those of the direct testing teams. The indirect testing team reported far fewer problems than the direct observation teams. This group also failed to observe the one serious problem that was identified by 7 of the 8 remaining teams. Molich observes, "Unattended testing didn't lead to any more (in fact, quite a bit less) reported problems and didn’t provide insights that other methods [missed]." (p. 73).

Ask 5 witnesses... get 5 stories.

Just as in interface design, consistency is critical to the success of usability as a field. If the task of usability testing is this inconsistent, what can that mean for user-centered analysis and design projects?

Molich' and colleagues' findings suggest there is significant variability in execution and findings across the task of usability testing. These were (mostly) professional level groups. They proactively volunteered to be evaluated. One would only assume they set out to present their best work in this very public venue.

The world may be ready for usability. Molich's study indicates that there is still a lot of art in the science of usability testing. But if usability is an art, can that art be made routine?

References

*Note that people like Mary Theofanos at organizations like NIST are working on this.

Dumas, J.S. & Redish, J.C. (1999). A practical guide to usability testing. (Revised edition) Bristol, UK: Intellect.

Rolf Molich, Meghan R. Ede, Klaus Kaasgaard and Barbara Karyukin: Comparative usability evaluation, Behaviour & Information Technology, Vol 23, Number 1, January-February 2004, page 65-74.

Comparative Usability Evaluation - CUE

Cool stuff and UX resources

Introduction

Blink! usability matters

Practitioners, not gurus.

None these things is just like the others...

Ask 5 witnesses... get 5 stories.

References

Message from the CEO, Dr. Eric Schaffer — The Pragmatic Ergonomist

Leave a comment here

Subscribe

Follow us

Cool stuff and UX resources

The world is ready for usability. Is usability ready for the world?

Introduction

Blink! usability matters

Practitioners, not gurus.

None these things is just like the others...

Ask 5 witnesses... get 5 stories.

References

Message from the CEO, Dr. Eric Schaffer — The Pragmatic Ergonomist

Leave a comment here

Subscribe

Follow us