Microsoft researchers have hit a milestone 25 years in the making. The company's conversational speech recognition system has finally reached an error rate of only 5.1 percent, putting it on par with the accuracy of professional human transcribers for the first time ever.
A year ago, the Microsoft's speech and dialog research group refined its system to reach a 5.9 percent word error rate. This was generally considered to be the average human error rate, but further work by other researchers suggested that 5.1 percent was closer to the mark for humans professionally transcribing speech heard in a conversation.
For over 20 years, a collection of recorded phone conversations known as Switchboard has been used to test speech recognition system for accuracy. This is done by tasking either humans or a machine to transcribe recorded telephone conversations between strangers on topics including politics and sport.