So we're spending Easter holiday in Beijing, admiring all the new ultra-small mobile phones in the FAB electronics store in Oriental Plaza on Wangfujing, when Geoff and I launch into an argument on the outer limit of mobile phone size. I belong to the school that says we'll eventually do away with keypads and simply have a tiny headphone (similar to the Jabra JX10 Bluetooth headset) that you wear on your ear and can operate simply with your voice. The need for a visual interface is the limiting factor for mobile phone size. Geoff doesn't think visual displays will ever be replaced by voice interfaces: "What if I want to look at my list of last redialed numbers and add one of them to my addressbook? How would the phone know to do that?"
"You'd say to the phone: Read last dialed numbers..."
"What if it's the fifth number on the list? It'll say, your last dialed number is 5,5,2,3,...and so on; that takes a long time."
"The key is you can interrupt it by saying 'Next' before it finishes speaking the number, and it'll start reading the next number in the list."
"What about accuracy? What about wind noise? It needs to be 99.99% accurate for it to be practical."
"Yes that is a major hurdle, but you can probably train it so that it recognizes your voice and can pick up its frequency from background noise. It'll probably need to have noise canceling function so that it can filter out the background noise too. That's already available in the newer Bluetooth headsets."
While we're on the subject, I hope some researcher out there is working on a way for computers to do exactly this, i.e. pick out your voice from background noise, so that you don't need to wear a headphone and mic when you're controlling your PC with your voice.
I have a friend who does pure research for NTT in Japan, a Venezuelan who specializes in speech recognition. Apparently present methods of speech recognition, such as the HMM (Hidden Markov Model) which translates your spoken words into a sequence of symbols that can be manipulated, are already pretty accurate, up to 98% or 99%, even at continuous speech. The key obstacles now to widespread acceptance of handsfree PC or mobile phone operation, are the need for an optimal listening environment (i.e. low background noise), and user resistance to aural user interfaces. They day when your computer can pick up your voice from 4 feet away or even across the room, and accurately execute your commands, is when voice recognition will really take off. Your won't ever need to touch that speakerphone on your desk, or your microwave oven, or the TV, etc. etc.
And the beauty of it is, both hardware and software can be the path to the holy grail. This means the solution could come from a hardware manufacturer like Nokia or Samsung, or a software company like Microsoft. Most likely the result will be a combination of both, though.