I just read a CNET News Blog, by Steve Tobak, on what he considers to be the Top 10 Technology Flops, where he has placed speech recognition in the middle of the pack, with the words “This has to be the biggest disappointment of all, especially for Star Trek fans. But here we are, still banging away on our keyboards. At least biometrics is starting to gain some traction.” My first thought was “obviously this guy hasn’t done his research”, and then maybe, “he just didn’t define the scope of where he thought speech recognition was supposed to do in 40 years.” I think I’m leaning towards the first.
Sometimes technology gets a bum rap. For example, I’d be rich if I had a dollar for every time someone who didn’t like voicemail talked about ‘voicemail jail’. Text-to-speech has gotten so good now that it’s been quite awhile since I’ve heard anyone refer to TTS output as sounding like ‘a drunken Swede’. But I still have a hard time when someone dismisses speech recognition because it doesn’t live up to the standards envisioned in Star Trek, or the more often used Hal in 2001.
So in defense of speech recognition, in 40 years we have come light years in what it can do. No, it’s still not perfect, but its applications are broad and in so many cases, useful. Look at just a few examples, such as voice-activated auto-attendants that allow you to get to anyone (or their voicemail ) at any time of the day, so that you don’t have to have an operator at 2 am, or figure out how to spell someone’s name on a touch-tone keypad. Take the thousands of speech-driven IVR applications (for better or worse in design – but that’s not speech recognition’s fault) that let you say your menu choices, or use open ended dialogue, to do everything from book airline tickets to get driving directions or reorder your prescriptions.
Along these lines, we also have incredibly valuable uses for speech in voice-activated dialing and voice search on mobile devices. In one of the most powerful examples I’ve seen for what the technology can do, you need only go to YouTube and watch the video “Texting While Driving” that Nuance did at their recent Conversations convention in Florida this fall. In it they did a demonstration that pit car drivers against each other – one using a mobile phone and speech recognition to get restaurant information, send an SMS message, and find and play a song, that used a mobile phone to do the same thing, only the second set had to text in the queries while trying to drive. It’s a good thing that it was just a demo because the second set of participants would have been dead if it had been real.
And yes, we are still banging away on our keyboards – well, most of us are, because it’s faster and we can. Even a blind friend of mine, just types into her computer, instead of using speech recognition. However, she couldn’t use the computer if it weren’t for speech rec’s sister, text-to-speech. But for those people who need assistive technology, speech recognition has been a live saver. Further, in many cases speech recognition has vastly improved the speed and accuracy of data input in work environments over banging into that keyboard. Just go check out what companies such as Datria are doing in the areas of field service automation and warehouse applications. Its pretty amazing stuff.
I could go on for a long time because I’ve been watching this industry for half of those 40 years, but I won’t. I’ll just say that there is no way I’d put speech recognition on a flop list of any type.
