Voice recognition is now shockingly good, but it's poised to get much better as the recognition services move into the cloud and suddenly acquire vast new speech datasets to test against. Microsoft is going so far as to call voice "the new touch."
According to Microsoft this week, "voice is the new touch." Never mind that we've been hearing the "voice recognition will change the world" mantra for more than a decade now; this time, it's the real deal! And the company might be right, thanks in part to the peculiar power of the cloud.
With the launch of Windows 7, Microsoft is again talking up its voice recognition efforts, which extend from operating systems to cars to mobile phones. The company has certainly been hammering away at the technology for quite some time; limited versions have been included in Office for years, and a full speech recognition package was built into Vista. Bill Gates has also been predicting the rise of voice communication for a decade.
Speech control is now built into Windows 7, the Sync system that appears in Ford cars, the Bing for Mobile program, and even Exchange Server 2010, which uses it to turn voicemail into text. None of this is particularly innovative on its own, of course; Google has long offered voice search in its mobile app, while companies like Nuance and MacSpeech have both marketed superb voice recognition programs (read our NaturallySpeaking and Dictate reviews) that far exceed the system that came built-in to Vista (we haven't had a chance to put 7's voice recognition through its paces yet). Even voicemail transcription, which Microsoft calls "one of the most eagerly awaited features" in the new Exchange Server, has been around for years from other providers.
But Microsoft does have something important: a crack speech-recognition team with access to cloud-based voice recognition servers. It acquired Tellme in 2007, and the Speech at Microsoft group now controls the Tellme voice platform, which manages more than six million calls per day.
This is crucial, because one of the big problems in training computers to do voice recognition well is collecting enough data. By way of comparison, Google built a powerful search suggestions system that uses its massive database of search queries to offer suggestions on misspelled words. Voice-recognition systems have traditionally relied on this sort of training, but the useful correction data they acquired was generally locked on a user's PC.
Nuance, one of the world's largest players in the speech recognition game, recognized that it could construct a better recognition engine if it could somehow get access to all of this speech data being collected by the millions of users of its Dragon NaturallySpeaking product. So it tried just that with version 9 of the software—users were offered a free, "tuned" recognition engine that was specially trained to their own voice if they consented to collect and share several hundred megabytes of data with the company. This data collection effort paid off with NaturallySpeaking version 10, which achieves remarkable accuracy rates with only a couple minutes of user voice training upfront.
So imagine what happens when the data collection effort is moved into the cloud and companies set up services used by millions of people. Recognition engines can quickly be trained in millions of voices, in thousands of dialects. That's why Microsoft's chief scientist in the Speech group, Larry Heck (who used to be the vice president of research at Nuance), said this week, "Speech belongs in the cloud. Only there can you reach the scale, the enormous volume of interactions required to create a speech system capable of rivaling human understanding. With the formation of the Speech at Microsoft group, the unrivaled breadth of our platform today, and our cloud-based approach, this future is within sight."
As someone who has used speech recognition regularly for years on multiple platforms, this future is "within sight" in the same way that I can see the moon out my window every night. Still, that's something, and anyone who tried to use voice recognition before, say, 2005 will be shocked by its capabilities and actual usefulness today. As companies like Microsoft, Google, and Nuance deploy more voice services that live in the cloud and not on a local machine, advances in understanding should accelerate—an exciting prospect for anyone (*cough* Editor in Chief Ken Fisher *cough*) who has ever had speech recognition software turn "but the fields" into "blood to feel."
Source: ars technica