Programs that can hear a word and, say, type it out, are old. Despite their advancing age, these programs still can’t interpret speech easily.
Have you tried the automated call centers where you have to over enunciate the word you are trying to use just so the dumb computer program gets it?
Well dumb is part of it. I swear that humans themselves can’t interpret speech that well either. Part of our expertise in interpretation is that we half know what is going to be said. In other words we are using more of our mind than just our interpretation-of-speech centers.
Then there are some people like my Dad. A few years ago his hearing was getting really bad. I was seriously worried that I’d never be able to communicate with him like I did when he was younger. He refuses to use the various hearing aids he has. And his interpretation skills have gotten better.
What’s that? I said he can interpret better. So much better, in fact, that most of the time I just think I’m talking with him like it used to be. Either he has gotten way better with less or he is now at least partially reading lips. Don’t get me wrong. I’m happy about this. Every once in a while I’m curious and ask him if he has his hearing aids in. Each time the expected answer is wrong. He doesn’t have his hearing aids in.
So computers might have to become more skilled in other areas before they become skilled enough to interpret speech as well as a human.
In fact, I think computer interpretation of the human voice is going to have to be better than human in order to do all we want computers to do.
I think that obviously even human speech recognition is fallible. Think of the saying “card sharp”. Others might say “card shark”. I think these two cliches resulted from faulty interpretations of each other by humans. I wonder which saying came first. “Coincidence,” you might say. But what about another pair of sayings: “end of the road” and “end of your rope”. There seems to be a pattern of similar sounding, meaning the same thing sayings that I think comes from our inability to interpret the speech exactly.
I think we humans hear things differently than each other. It’s going to be hard to have a computer interpret words in such a way that all humans agree on the meaning. And I didn’t even bring accents into the conversation – yet.
These mis-heard statements are called ‘eggcorns’. Language Log has a truly fascinating explanation of eggcorns, and there is a bunch of other grammatical anomaly nomenclature there too: Snowclones, Crash Blossoms, the Cupertino, and a bunch of others.