HFI Usability Home

Usable. Experience. Design.

HFI Usability Home About HFI - Usability Experts Usability Consulting Usability Training & Certification Usability Tools & Standards Usability Newsletter Executives Only  

Contact Us | 1-800-242-4480

 
UI Design Newsletter
Current Issue
Past Issues
Reader Comments
Subscribe
Change Address
divider
HFI Webcasts
June 2008 Webcast
Upcoming Webcasts
Past Webcasts / Podcasts
divider
Ask Eric
Questions & Answers
Ask your question
divider
Readings
Published HFI Articles
White Papers
Intranet Standards
GUI Standards
Quantitative Usability
e-Commerce Usability
GUI Design
IVR
divider
Just Fun
Cartoons
Mouse Maze
10 Web Usability Tips
Usability Quiz
Web Usability Quiz
Contextual Innovation Quiz
Persuasive Design Quiz
Persuasion Flow Symbols
History of HFI Buttons
divider
Resources
Persuasion Flow Symbols
Accessibility
Bibliography
Usability Links
HCI Degree Programs

UI Design Newsletter – January, 2000

Past Issues | Print this page | Email this page

Insights from Human Factors International

divider line

In This Issue Bob Bailey reviews:

Speech recognition

Why is it taking so long for speech to be used as a primary input method?

 
Introduction
   
 

Automatic speech recognition technology has been under development for over 25 years, but has not yet received widespread use. One of the main reasons that speech recognition has not gained greater acceptance is that speech recognition errors are fundamentally different than keying errors. Most keying errors can be tracked back to users, while most speech errors are tracked back to mis-recognition of the speech by the computer. In the latter case, user input simply does not match computer output.

Even though people can dictate faster than they can type, actual throughput is usually much slower with automatic speech recognition systems than with keying. A major problem is that error correction takes much longer with speech. The most commonly used correction methods used with speech input are:

  1. deleting and repeating the last phrase,
  2. deleting and repeating a specific word,
  3. deleting and selecting a correct word from a list of alternative words,
  4. typing the correction.
 
Multimodal Correction
   
Model-Based and Empirical Evaluation of Multimodal Interactive Error Correction, Suhm, B., Myers, B. and Waibel, A., CHI 99 Conference Proceedings, 584-591 (1999).

Past studies have suggested that switching modality could speed up interactive correction of recognition errors. Suhm, Myers and Waibel (1999) at Carnegie Mellon University found that switching between modalities eliminated repeated recognition errors. They found that if users simply repeated their speech to correct errors, correction accuracy was much lower than if users switched to a different modality (keyboard and mouse). The correction accuracy when keying depended on the user’s typing skill. For example, the fastest typists using "keyboard and mouse" made almost three times more corrections per minute than did subjects who made corrections using "voice-only."

They concluded that multimodal correction strategies could reliably expedite error correction in speech user interfaces.

 
Throughput
   
Effect of Error Correction Strategy on Speech Dictation Throughput, Lewis, J.R., Proceedings of the Human Factors and Ergonomics Society, 457-461 (1999).

Throughput is the number of correct words produced per minute. The key variables are:

  1. the accuracy of the speech recognition system,
  2. the speaking rate of the user, and
  3. the time required to correct errors.

Lewis (1999) at IBM evaluated the performance of participants using a speech recognition dictation system. The participants received training in one of two correction strategies, either "voice-only" or "voice, keyboard and mouse." In both cases, users spoke at about 105 uncorrected words per minute. The multimodal (voice, keyboard, mouse) corrections were made three times faster than "voice-only" corrections, and generated 63% more throughput.

 
Keyboard Correction Faster
   
Patterns of Entry and Correction in Large Vocabulary Continuous Speech Recognition Systems, Karat, C.M., Halverson, C., Horn, D. and Karat, J., CHI 99 Conference Proceedings, 568-575 (1999).

Karat, et.al. (1999) at IBM evaluated three speech recognition products with their users correcting errors by using either "voice-only" or "keyboard and mouse." Participants were native English speakers with good typing skills.

Each person trained one of the speech recognition systems to more readily recognize their voices and then completed two tasks, copying from a novel and composing replies to questions. The fastest users spoke at an average of 107 uncorrected words per minute, which resulted in about 25 corrected words per minute. The "keyboard-mouse" group completed almost three times more words per minute than did the "voice-only" group.

Participants observed that they were usually aware of when a typing error occurred, but were much less confident of being aware of when a speech error occurred. Users must either constantly glance at the display for errors, or rely heavily on proofreading after the speaking has ended.

 
Conclusions
   
 

It seems that the primary reasons that developers are avoiding speech for input are that:

  1. speech recognition systems are still somewhat unreliable, and
  2. error correction continues to be difficult (and can lead to even more errors).
Comment on this article
 
Name: *
Company:  
Email: *
Comment:  

Reader comments on this and other articles.

The HFI User Interface Design Update Newsletter discusses the latest research in the field of usability. To learn more about the practical application of recent usability research and how it impacts user-centered design, we invite you to attend our Putting Research into Practice course.

Past Issues