Speech recognition is a complex task relying on many sources of
knowledge (acoustics, phonetics, linguistics, semantics) and complex
techniques (signal processing, acoustic-phonetic models,
neural networks, statistical
language models).
Speech recognition accuracy is highly
dependent on the type of data to be processed and is typically
measured in terms of word error rate which can be as low as a few
percent for some tasks and as high as 40% on very challenging tasks.
Due to the complexity of the speech production and perception
processes, the rate of progress in speech recognition has always
been slow. However the reduction in word
error rate has been steady for about 25 years and progress is
expected to continue for many years to come. Progress can also be
measured relative to human speech recognition performance, and in this
respect the performance gap between humans and machines is being
reduced slowly but surely year after year.