Pages

Saturday, March 27, 2010

ASR: AUTOMATIC SPEECH RECOGNITION

What is Automatic Speech Recognition?

Automatic Speech Recognition (ASR) is technology that allows a computer to identify the words that a person speaks into a microphone or telephone. The "holy grail" of ASR research is to allow a computer to recognize in real-time with 100% accuracy all words that are intelligibly spoken by any person, independent of vocabulary size, noise, speaker characteristics and accent, or channel conditions. Despite several decades of research in this area, accuracy greater than 90% is only attained when the task is constrained in some way. Depending on how the task is constrained, different levels of performance can be attained; for example, recognition of continuous digits over a microphone channel (small vocabulary, no noise) can be greater than 99%. If the system is trained to learn an individual speaker's voice, then much larger vocabularies are possible, although accuracy drops to somewhere between 90% and 95% for commercially-available systems. For large-vocabulary speech recognition of different speakers over different channels, accuracy is no greater than 87%, and processing can take hundreds of times real-time.

The dominant technology used in ASR is called the Hidden Markov Model, or HMM. This technology recognizes speech by estimating the likelihood of each phoneme at contiguous, small regions (frames) of the speech signal. Each word in a vocabulary list is specified in terms of its component phonemes. A search procedure is used to determine the sequence of phonemes with the highest likelihood. This search is constrained to only look for phoneme sequences that correspond to words in the vocabulary list, and the phoneme sequence with the highest total likelihood is identified with the word that was spoken. In standard HMMs, the likelihoods are computed using a Gaussian Mixture Model; in the HMM/ANN framework, these values are computed using an artificial neural network (ANN). For more details about HMM technology, as used in the HMM/ANN framework.

No comments:

Post a Comment