0
1
EDUCATIONAL COMPUTING   |    
Speech-Recognition Technology for Computers
Tom Kramer, M.D.; Robert Kennedy, M.A.
Academic Psychiatry 1999;23:48-50.
View Author and Article Information

Educational Computing ColumnComputersSpeech Recognition

text A A A

There have been some significant improvements in speech technology for the computer over the last 1 or 2 years. These innovations have reduced prices and generated great interest on the part of physicians and educators.

We are still far from the computer that will obey your every spoken command that we have seen in the movies or in science fiction TV programs. The technology for personal computers to understand the spoken word is improving and worthy of some discussion.

Our desire to have a machine that could either accept or synthesize our speech has been high upon our technology "wish list" for decades. Speech technology was demonstrated at the 1939 World's Fair, but progress has been slow while we waited for both adequate hardware and sophisticated software. Even though current computers use advanced keyboards, a mouse, and a graphic interface to simplify input into the computer, most of us are not good typists and many of us have fantasized about how easy it would be if we could just "talk" instructions to our computers.

There are two modes or types of speech-recognition software: "discrete" and "continuous."

With "discrete speech" recognition, you must pause after each word, sounding quite mechanical or artificial. It is also very difficult for people to speak like a robot. This type was the first method of recognition, and since it forced the person to adapt unnaturally to the machine, it was a self-defeating technology.

What is more natural and more real to how we normally speak is "continuous speech." Significant breakthroughs in continuous speech recognition have occurred in the last year or two. This "leap" in development has resulted in a new set of capable and inexpensive speech-recognition programs that put this technology on the right track toward a real computerized assistant that can save us from our bad typing skills.

Reviewing the hardware requirements, you need a sufficiently fast computer with sufficient memory (random access memory [RAM]). Today, this means a Pentium-based computer or a Power Mac for the Mac OS with at least 32 to 48 megabytes of RAM. Memory is a key feature. The more RAM the better, but many who purchase voice-recognition programs are very disappointed with the performance of the program if they have the minimum recommended memory.

Next, your computer needs a sound card. You should check the speech software, but before you purchase it, check to see if your computer's sound card is listed as compatible for use with that software package. You then need a microphone. Generally, a good microphone is provided in the box containing the speech-recognition software. The microphone is important because it filters out background noises and focuses primarily on your voice. The only other hardware that you might possibly need are speakers if you do not have them with your multimedia system.

Installation is easy, quick, and generally painless. The software requires about 150 MB of hard disk space storage. After the mechanics of installing the software, these programs require some initial time to "learn" your voice. We each have our eccentricities or regional accents, and we need to train the speech "engine" to become accustomed to our way of saying words. This process may take as little as 4 minutes or up to 30 minutes initially. This is accomplished as you read paragraphs that the software is familiar with, and it retrains itself on your voice.

You speak into the microphone, and the words that you say are captured and processed by the sound card. The software then reviews and analyzes the sound to distinguish between lower frequency vowel sounds and the higher frequency consonant sounds.

Next comes the word-matching phase. Sounds are matched in two different ways: First, matching your sound with a library of similar sounds, and, second, using a concept known as "language modeling," the software assesses the likelihood that a given word would appear between or after a particular word.

Finally, in the decoder phase the most likely choice is made based on a ranking of the previous steps and the word appears on your screen for your approval and/or disapproval.

If the word that appears on your screen is not what you said, you can correct it immediately and continue or you can wait until later. You cannot only edit but also add words or entire phrases "on-the-fly" as well. Formatting "as you go" is also possible, giving you the possibility to change the look of paragraphs with fonts, boldface, italics, etc. either as you dictate or after you have finished.

Speaking at about 150 words per minute, users can expect between 80% and 95% accuracy with most of the programs, depending on the clarity of your diction. These programs maintain around 30,000 or more words in RAM (instant access), with maximum active vocabularies at about 80,000 words, and reference dictionaries on disk of over 200,000 words.

You can switch between diction mode and command mode. This means that you can control the menus on your computer. This is an important feature since you may wish to switch from one document to another and cut and paste paragraphs or possibly move information from one program to another. Command mode gives you voice control over your computer, which takes this concept well beyond the limits of a word-processing program and certainly toward the future.

There is also support for multiple users in an office environment. Many offices are networked, and there may be different people using the same word-processing program. Different users can log in with their name and their voice "profile" will be activated, thus eliminating a retraining process.

These programs work with most of the popular word-processing programs, and some can be used in other programs such as E-mail programs. Another great feature of many of these programs is the integral ability to read text back to you. This is a great feature that allows the user to check the correctness of the dictation. It pronounces the words out loud. It also can double as a text E-mail reader to listen to your messages while you do something else—much like voice mail or an answering machine.

They cannot do punctuation automatically. You need to say "comma" or "period." The same is true of paragraphs and formatting. You need to tell it, for example, "new line" or "new paragraph." They do not improve your grammar or writing style. Although there are grammar evaluation components in many of the popular word-processing programs, they are not designed to make you a better writer, they just make the process easier. An accuracy rate of 80%—90%, although quite good, still requires some vigilance on the part of the person dictating. As the technology continues to improve, perhaps speech-recognition programs will become more intuitive or help improve our writing skills.

As mentioned before, there are features built into these speech-recognition programs that can read text back to you. This feature is called "text-to-speech."

It is worth mentioning that for those who only want this particular feature from their computer, there are several programs on the market that read text aloud with incredible sophistication. Some were developed for individuals with dyslexia and voice-impaired people.

These programs function as proofreaders, learning tools, and audio assistants, which allow you to listen to E-mail, reports, documents, and downloaded text from the Internet.

For those seeking to expand their professional dictionaries with speech-recognition programs, there are also available large medical dictionaries, legal dictionaries, and other professional dictionaries that can be purchased to integrate into this technology. This saves time because dictating technical words or phrases will normally come up as an error in the average word processor. It then requires the user to correct the word and add it to the word list. Professional dictionaries take care of those steps.

One vendor advertises a medical vocabulary that allows you to dictate patient histories, symptoms and complaints, general physical examination findings, etc., as well as letters to patients, insurance companies, referral letters, and a variety of form templates.

With faxing capabilities available in most personal computers that you purchase today, applications such as this can give you the flexibility of writing a report on a patient, filling out a form, and faxing them with only minimal typing on the keyboard. A prescription can be faxed to the local pharmacist or you can transmit a patient's report to a referred physician.

Significant improvements have been made recently in speech-recognition technology. This will continue to grow exponentially since those of us who cannot type are hungry for this method of getting information into and out of the computer. In the years ahead, we will see continued improvements in recognition, as well as more integration with the other applications that we use a computer for such as Internet access, financial programs, databases, and presentation programs.

By next year, all of the major word-processing or office-suite software manufacturers will package speech software as part of the purchase. Recent developments in speech-recognition hardware (chips) have encouraged development in handheld devices without keyboards that will use speech input. These applications should also offer increased interactivity for handicapped individuals.

Five companies have a commanding lead in development and marketing speech-recognition software today. Dragon Systems (Naturally Speaking), IBM (Via Voice), Lernout and Hauspie (Voice Xpress), Phillips (Free Speech 98), and Apple (Plain Talk). Prices range from around $40.00 to $150.00. Specialized vocabularies are extra.

Dr. Kramer is Assistant Director for Training, Arkansas Mental Health Research and Training Institute, Little Rock, AR; and Mr. Kennedy is Director of Fellowship Training and Director of Computing Services, Department of Psychiatry, Albert Einstein College of Medicine, Bronx, NY. Dr. Kramer's e-mail address is: tamkmd@aol.com. Mr. Kennedy's e-mail address is: kennedy@aecom.yu.edu.

+
+
+

CME Activity

There is currently no quiz available for this resource. Please click here to go to the CME page to find another.
Submit a Comments
Please read the other comments before you post yours. Contributors must reveal any conflict of interest.
Comments are moderated and will appear on the site at the discertion of JBJS editorial staff.

* = Required Field
(if multiple authors, separate names by comma)
Example: John Doe



Related Content
Articles
Books
Manual of Clinical Psychopharmacology, 7th Edition > Chapter 1.  >
Gabbard's Treatments of Psychiatric Disorders, 4th Edition > Chapter 20.  >
Gabbard's Treatments of Psychiatric Disorders, 4th Edition > Chapter 20.  >
Gabbard's Treatments of Psychiatric Disorders, 4th Edition > Chapter 20.  >
Gabbard's Treatments of Psychiatric Disorders, 4th Edition > Chapter 20.  >
Topic Collections
Psychiatric News
PubMed Articles