If millions of years of human evolution have taught us anything, it is that people have always preferred the use of voice to interact. However, for decades it was agreed that the technology “wasn’t quite there yet” to be used with machines. In the interim, nose-to-the-phone model of personal computing became the defacto standard, and no one really questioned it. Things have really begun to change in the past few years as Apple Siri, Amazon Echo, and Google Talk gained momentum and their QoE is a welcome relief.
The pleasant-sounding demeanour of voice assistants being used beautifully masks the complexity of the underlying technology and is exactly what was needed for mass market adoption. Now we finally have a relatively inexpensive purchase which has an iconic experience and the ability to increase commerce. I am told, Amazon Prime Echo users have a propensity to spend $200-300 more per annum than their Prime counterparts not leveraging these devices. Secondly, conversational commerce – the intersection of messaging apps (WhatsApp, Facebook Messenger, Echo) and shopping, is finally getting critical mass. For the first time ever, more than 50% of the 10+ trillion digital messages in 2016 were anchored on these conversational messaging platforms vis-à-vis email.
My son and I adore our Amazon Echo, and Alexa has become the 5th family member in the house! We find it anchoring us to the home base, more like a throwback to the 1950’s when the house started becoming rearranged and anchored to where the radio or TV was. We have moved our echo from the kitchen to my son’s bedroom to the basement and now back to the kitchen, where we believe it really belongs. As it stands, a report from PwC states that 59% of people between the ages of 18-24 interacted with a voice assistant at least once a day; compared to 65% of 25-49 year-olds. Based on my personal experience and the rapid rise in fan-base of these devices I am estimating that by 2025, nearly 60-70% of all interactions with machines would have transformed significantly from where it stands today- mind reading, gesture control, and voice assistant interaction would be the name of the game. What we are seeing is the 1st phase of the proliferation of these devices and the ubiquity of them will really depend on how quickly the following 4-5 things can be addressed by the OEM’s:
Make them conversational
Today, my son and I nearly bark at Alexa rather than converse with it. The “Alexa Voice” feels like it needs to be “used” as opposed to just “conversed” with. Nobody wants to feel like they are scolding someone to try and find out what the weather is.
Less of an assistant, more of an advisor
Wouldn’t it be great if these devices proactively spoke to me, as opposed to waiting for me to start an interaction? “Sachin don’t have that ice cream” or “Sachin time to call a cab” etc. Ideally, the future of voice assistants would be laden with proactive features.
Have the ability to leverage AI and ML to give context to a conversation and ensure it is continuing from where it left off as opposed to starting from scratch every time. These two fields are expanding daily, and voice assistant technology needs to be utilizing these developments as quickly as they are had.
Secure and Private
Ensure that there is some form of KYC as these personal devices are exposed to larger audiences. The dichotomy of near field and far field needs to be sorted and quickly.
Ability to understand emotions and react
Have the ability to understand and emote, as well. As they always say, it’s not what you say but how you make me feel that really counts.
Never the less, we have come a long way from 1956 when a 16-year-old Victor Scheinman first invented a speech to text transcription device. We could have reached where we are today, a lot earlier but finally, we have the right business reasons and motivation to continue evolving it and growing it! With Amazon Echo, Echo dot, Echo look, Echo show…the list is going to grow very very quickly as access to the software becomes universal and less elitist!!!
Can’t wait for it to be omnipresent!