At the I/O Conference this week, Google unveiled major improvements to its Google Home voice-based assistant. This is, of course, their response to Amazon Echo, which has proven to be a mega-hit consumer device. They also recently announced the availability of the Google Assistant on iPhone, taking aim at Siri’s growing popularity. Amazon, for its part, just gave Echo a major upgrade, by adding both camera and display options. Apple is rumoured to be preparing an “at home” embodiment of Siri. So, this is shaping up to be a heated horse race.
If these “meta-assistants” continue their growth in popularity, are they going to become a dominant interface for customer service? If so, what communication channels will they displace? And how will the corporate owners of these platforms, who already wield unprecedented power, influence the way customer service is done?
The Case in Favor
Stick with me for a minute… The fastest growing channel for customer communication is chat. That statement holds true whether or not you combine “chat” with “messaging”, or even agree on a distinction between the two. See more on that debate here.
Chat is very cost effective and matches behaviour that younger consumers already favour. Chat also sets up the transition to “bots”, or any other flavour of text-based self-serve options. (Of course, self-serve has always been and will always be the lowest cost form of customer service. That’s ultimately what fuels the excitement around bots. More here.)
What does all this have to do with intelligent assistants? Well, one of the biggest problems with chat / messaging / chatbots is discovery. That is, how does a typical consumer find out that company X offers chat and on which channel? Do I search on Facebook Messenger? Send a tweet? Send an SMS? Go to the website and look for a chat widget? Each of those is a valid option, but for a consumer with limited patience (and tech savviness) how many of those paths will they explore? For this reason, phone calls remain the default action.
A meta-bot that fulfills its promise — that is, becomes the gateway to all the services you need — solves this problem. If, say, Google Assistant achieves such dominance that one can just assume any company you want to reach is available through it, then that is the channel where chat and chatbots reach their full potential. This level of dominance is indeed possible: WeChat for commerce in China, and LinkedIn for professional networking. But once we reach that point, the question becomes, “What’s the other side of the bargain?” i.e. What will Google want in return for becoming the customer service gateway?
The Case Against
Two things are working against this future. The first is simple fragmentation. Amazon, Google and Apple all have strong offerings. Microsoft (with Cortana) and Facebook (with “M”) are hoping to catch up as well. It’s likely no one platform will get to full domination
The second is the limitation of a voice interface. (Google Now and Microsoft Cortana both have text as an alternate interaction method, but the critical ground to capture seems to be the “voice-based living room device”.)
Controlling technology via voice has a fundamental problem that analyst Benedict Evans calls the “uncanny valley”, which is the “I don’t know what I can ask it” problem.
This is a problem that won’t go away even with better speech recognition technology (which is already pretty good). The core of it is that discovering the boundaries of what a particular voice assistant can do is difficult. With a visual interface you can see the available options. With an audio interface, not only do you have to learn the skill-space of the assistant, you have to memorize it, because there are no cues to help you along. Even worse: if you use multiple assistants you have to remember that they have different skill-spaces. Even worse: those skill-spaces are getting updated all the time, so you have to keep learning.
So, if you really have a HAL9000 / Jarvis, where you can talk freely, then fine. On the other end, if your a voice interface that will only tell the weather, then that’s also fine. But in between — where the conceit is “ask me anything” but there’s actually a very limited skill-set — THAT’S the big valley. In the valley, the interface is putting a large burden on the user to continually probe and remember what you can ask and how.
So there you have a sketch of some of the opposing forces at work here. This is a fascinating space to watch, as we see a combination of incredible technology gains (in particular with speech recognition) and consumer behavior change (who would’ve guessed millions of people would be comfortable with an always-on microphone in their kitchen sending their audio to Amazon?) I’d love to hear any feedback in the comments!