We live in the times of the turning tables. For hundreds of years, humankind has diligently invented devices whose operation excluded the most natural for our species way of interaction with the world: the speech. Yet, if you can count switches, tumblers, pedals or a steering wheel as quite feeble attempts to stand against nature, the appearance of computers and later on smartphones was a bold-faced challenge. We haven’t even noticed how we have stripped the speech of its leading role in contacting the Universe and gave it to our fingers. Beating a tattoo on a keyboard or tapping the screen has become so routine today that we don’t even think how strange it is to have the most powerful and unique tool, the vocal apparatus, at your disposal and yet to solve the most important problems using quite primitive and sparse external manipulating devices.
Fortunately, any anomaly is finite by definition. Voice-driven interfaces have defended their right to exist for many years and today, it looks like we might even talk about their chances of winning.
Few people know this, but work on creating speech recognition systems has begun a very long time ago, sometime in the middle of last century. There were even working devices manufactured, with very limited capabilities, of course. The subject gained momentum only in the 1990s, when computerization has become all the rage, the consequences of which we have the good fortune to see today.
Just around that time I had worked with that problem quite closely. In my senior year at the Moscow Institute of Physics and Technology [PhysTech] (Stan Yurchenko graduated from the Department of Control and Applied Mathematics in 1993, Editor’s note), my colleagues and I received financing from an American company, an amazing amount of money back then, ten thousand dollars. We bought an Intel 286 PC, quite a decent one by the standards of 1992, and organized a classical ‘garage project’. The budget and equipment that were ludicrous by corporate standards were quite compensated by our level of expertise, after all PhysTech is still PhysTech. In view of our tender years, we were much more interested in the scientific aspects than in commercial ones, but our results were better quality than anything similar that was available back then in the West, though only in the area of Russian-language speech recognition. With time, the money ran out, the client melted away, we grew up, but of course we are still interested in the subject. It’s just like first love.
These days, it is more interesting to look at speech recognition technologies as more of a social problem than a technical one.
There is a fun discussion technique: get a list of stages of a process, preparations to go to Mars, for instance, and in the middle, highlight a stage with a note ‘We are here’. Then you can move this ‘here’ marker year after year, regularly putting to shame last year’s opponents.
So, in the case of speech recognition technologies, our marker is still quite in the middle of this hypothetical list. We’ve had the most advanced hardware forever; we have excellent mathematical calculations; there are megabytes of code. Designs and solutions have left the labs ages ago; we have seen and held in our hands quite decently working tools. Voice assistants are installed in every smartphone. Yet, the public still views voice activation as “Wow”. Voice interfaces are treated either as entertainment or as an additional accessory.
Yet again it turns out that we are not ready for our own achievements. Technically, there are no obstacles today to driving a car, for instance, by using a mike, folding your hands in your lap and simply saying “right”, “left”, “faster”, “stop”. Yet, we feel more comfortable pushing at the steering wheel and pedals.
The most obvious obstacle is regular inertia and protective conservatism that has saved us from extinction so far, which undoubtedly makes the aliens looking at us sit up and take notice. Fortunately, anything new needs getting used to; be it GPS technologies or nuclear fission, it comes into our life when it has become quite old. This is trivial, however, and no fun to talk about.
It is much more interesting to talk about self-trust. It just so happens that when we use our hands, feet or fingers, human actions are much more thought-out and intelligent than what comes out of a human mouth. The level of speech entropy of an average Homo sapiens representative regularly makes one doubt the validity of the second part of this title. There is probably not a human on Earth who has not regretted what they said. I mean, at least once a day.
Now imagine that speech is not only a means of communication, but has become the key tool for using all the scrap metal around us; and that the cost of error has risen accordingly. It will become one and unavoidably at that. It may not happen in five years; it might take as long as fifteen, but it will happen just because it is natural; we have simply gotten distracted for a while and turned aside into a small dead-end road.
So these supposed fifteen years is the timeframe that humankind has to review its approach to what it says aloud. Alternatively, the problem will be solved in the usual evolutionary way: by natural selection. When you are at the controls of a voice-activated plane, you can only say once “Let it all go to hell.” Just once.
One thing I am absolutely sure about: whichever of these ways humankind chooses in the near future, we will give its leading part back to speech. The light will come, the water will flow, the door will open – at a word.
About the Author
Stan Yurchenko was born in 1970 in Tolyatti. In the 1990’s, Stan Yurchenko studied at the Moscow Institute of Physics and Technology where he came across the subject of computer speech recognition. Stan Yurchenko’s project did not achieve commercial success back then, but it provided the authors with experience and expertise that allowed discussing the issue today not only from the technological but also from the philosophical point of view. Stan Yurchenko has varied experience: he was head of the IT Department at the Central Agency of Air Service of Russia, held the office of Vice President of the Russian Academy of Business and Entrepreneurship, and worked as the CEO at the Urban Information Technologies Center. The largest project carried out by Stan Yurchenko was probably the development and organization of the utilities payment and accounting system for the Unified Information and Payment Center Automated Control System in Moscow. This system is currently used by the Government of Moscow for accounting of utilities payments by Moscow residents as part of the My Documents Multiservice Center providing state services. Right now, Stan Yurchenko is working on investment projects, including technological ones (blockchains, fintech). He participates in attracting foreign investment to Russia to create Competence and Technology Development Centers. That is why it is all the more unexpected and interesting to hear the forecast of a mathematician and entrepreneur Stan Yurchenko on a subject that is way beyond the boundaries of information technology.