This site uses CSS (Cascading Style Sheets). If you can see this message, you may have some trouble navigating this site. You will need to upgrade to any CSS capable browser, such as
MS Internet Explorer (version 5.5 or higher),
Netscape (version 6 or higher),
Opera (version 5 or higher), or
Safari (any version).
Speech Recognition Becomes Continuous
by Yoshinori H. T. Himel
March 18, 1998 saw the return of Michael Moser, president of EXAQ Micro Services in Citrus Heights, updating members of the Sacramento Lawyer Computer Users Group (SLUG) on the latest developments in speech recognition technology. EXAQ is a speech integration company; it puts together speech recognition software, hardware and training for two markets: rehabilitation, for persons with disabilities, and professional productivity. Moser had demonstrated DragonDictate, from Dragon Systems, Inc., at a SLUG meeting a year ago.
Continuous vs. Discrete Speech
Moser started with some history. In 1984, Dragon Systems became the first vendor to build speech recognition capability into PCs. From 1984 to 1997, the market consisted primarily of "discrete speech" products.
You may have noticed that, although a written sentence consists of words separated by spaces, normally there is no separation between words in a spoken phrase. Discrete speech voice recognition products, however, need the speaker to specify word boundaries by inserting a slight pause between words. As Moser showed last year, you can speak quite rapidly even with the pause, but it has to be there. Using discrete speech, you can dictate into a document or you can give control commands, such as "open," "save," or "exit."
In 1997, continuous speech products came on the market. With these, you can dictate without the inter-word pause.
Continuous Speech On Screen
In March, Moser demonstrated Dragon Systems' newest offering: its NaturallySpeaking Deluxe continuous speech software. With this $695 package running on his laptop machine, Moser faced the SLUG attendees, with the projection screen behind him, and dictated rapidly into a microphone. Without his watching, an entire letter took shape on the screen. NaturallySpeaking is quicker and more accurate than its predecessor for dictating documents, he said.
The letter contained an erroneous phrase. Turning to the screen, Moser told the computer to "select" the phrase. The software duly searched for and highlighted it. Moser then enunciated the phrase he wanted, and the program overwrote the erroneous phrase.
This whole-document dictation sequence resembles that of IBM's voice recognition packages, IBM Voice Dictation (discrete) and IBM Via Voice Gold (continuous), and is unlike Dragon's discrete speech product. DragonDictate encouraged the user to correct transcription errors as they occurred. Moser explained that the dictate first, correct later sequence used by NaturallySpeaking better accommodates the flow of your thoughts without the interference of correction tasks. Should any of the dictated text be unrecognizable, a .wav file contains a sound recording of your dictation.
Dictation Tapes
The dictate first, correct later sequence brings to mind the dictation micro-cassettes that attorneys used to give their secretaries. Now, an adapter can connect a portable transcription recorder to a Dragon NaturallySpeaking equipped computer, so that the computer can transcribe from tape.
Controlling Your Computer
What about the control commands, such as "open," "save," or "exit," that discrete speech packages handled? Moser said that these packages still are useful for computer control, and for some non-word-processing applications like spreadsheets and databases. In fact, NaturallySpeaking includes a version of DragonDictate for just these purposes.
Moser also mentioned that speech recognition software, with the proper additions, can voice-actuate lights and appliances. This could give handicapped individuals, and eventually everyone, more convenient control over their environment.
Recognizing Handwriting
Moser asked how many of the attendees had used optical character recognition (OCR). About one-third had. An electronic pen and tablet, plus pen-based OCR software, can render your handwritten notations as computer document text, he said. Can a resurgence of yellow tablet drafters be at hand?
Hardware requirements
To run Dragon NaturallySpeaking Deluxe, Moser recommended a fast Pentium-class processor with MMX and 48 to 64 MB of RAM. Windows 95 or NT, a suitable sound card, and 65 to 120 MB of hard disk space are required.
NaturallySpeaking is quite compatible with Microsoft Word, giving you 95% of Word's functions through speech, he said. (NaturallySpeaking does not affect keyboard and mouse functions.) With other Microsoft applications such as Excel, and other word processors such as WordPerfect, compatibility is significantly less. Dragon is working with Corel under an agreement to develop compatibility with WordPerfect. Compatibility modules such as the one for Word are available free on the Internet as they are developed, he said.
Questions of Accuracy
SLUG member Jim Mize asked Moser about a PC Magazine report that rejected NaturallySpeaking and Via Voice because it observed accuracy rates below 90%. That would mean an average of more than one error in 10 words. Moser agreed with PC that 90% accuracy is unacceptable. 95% is borderline and 97-98% is good, he said. To PC's test results, he responded that magazine software testers typically do not have time to train themselves and the software for optimal performance. In his customers' experience, accuracy rates for NaturallySpeaking are much better than in PC's tests, he said.
Moser said that an attorney can expect to do basic dictation after about an hour of instruction. Another hour spent training the software and experimenting is desirable for increased capability and dictation accuracy. The user (or Moser's company) can spend additional time programming multi-step tasks into "command macros." For example: "Fax this to Sam."
Because NaturallySpeaking is designed to take rapid continuous dictation, pace matters. For example, the letter "t," pronounced slowly, can come out as a "t" followed by an "e," he said. Clear articulation also helps; he encouraged users to dictate as if delivering a speech, not casually conversing. NaturallySpeaking can help you improve your speech habits, he said.
An attendee asked whether there is a problem when the user has a cold. The difference in your voice affects accuracy, Moser said. To avoid having the software's profile of your voice adapt to atypical cold-muffled speech, you can opt for it not to save to your voice file at the end of the job, he said. Or, an attendee suggested, a chronically afflicted user can tell the software there are two users: one without and the other with the cold.
Finally, Moser was asked whether documents generated by speech recognition software should be spell-checked. The answer generally is no, because the words in the software's vocabulary are spelled correctly. The exception is where the user spells out words. Grammar checkers are still useful, he said.
Reference
Dragon Systems, Inc., 320 Nevada St., Newton, MA 02160, (617) 965-5200, fax (617) 527-0372, www.naturallyspeaking.com
IBM Corp., 1133 Westchester Ave., White Plains, NY 10604, (800) 825-5263, www.ibm.com/products
Michael Moser, President and CEO, EXAQ Micro Services, 6359 Auburn Blvd., Ste. B, Citrus Heights 95621, 722-2358, fax 722-0839, www.exaq.com