Recently, I blogged about this podcast, in which Voicebot.ai’s Bret Kinsella talks with John Kelvie from Bespoken about how “domains” will replace voice apps. I enjoyed John’s blog about this concept so much that I wanted to excerpt again from the blog:
Most of what is written above hinges on just a couple of key observations:
– Users do not remember invocation names
– Multi-turn dialogs sort-of work – in some cases they are useful and appropriate. But for the most part they annoy users and should be avoided.
If you accept these observations, everything else I’ve laid out follows fairly naturally. Of course, someone might come up with (or perhaps unbeknownst to me, already has) how to (a) improve users’ memories (b) remind them of phrases and experiences without annoying the love out of them, and/or (c) miraculously, markedly improve the state of the art of speech recognition. But assuming none of the above occur in the next 12-18 months, I believe most of what I have written is inevitable. At least, it is if we want to have a vibrant ecosystem for third parties.
The past few days I’ve been blogging about this podcast, in which Voicebot.ai’s Bret Kinsella talks with John Kelvie from Bespoken about how “domains” will replace voice apps. I wanted to offer one last excerpt from John’s blog, pulled from the bottom about how companies that are building their own voice assistants might be better served doing something else:
The devices in column one are inevitable and in some cases are already essential. Column two? Many may seem silly but some nonetheless will prove indispensable.
And these are JUST the devices with voice-capabilities embedded – the march of voice continues to be the march of IoT. Voice is our point of control for the ubiquitous computing power that exists around us. If you imagine a world in which the average cell phone owner has just ONE of each of the above items, the coming wave of voice-enabled devices looks like a tsunami. And if you factor in the devices under their control (thermostats, lights, power switches, appliances, etc.), it becomes even more staggering.
And the very good news is third parties have a huge role to play – the big guys need to provide the platforms and the device access, but they cannot do all the fulfillment. The future of the ecosystem is everyone playing nicely together in this new query-centric, domain-centric world, in which first and third-parties work together seamlessly.
For the platforms, it’s the chance to employ, at massive scale, the wisdom of the crowd – the wisdom of every brand, app builder, API and website on earth. What an amazing achievement it will be.
For third parties, it’s the opportunity to meet users, wherever they are, whatever they are doing – properly done, they will be just a short trip of the tongue away.
In this podcast, Voicebot.ai’s Bret Kinsella talks with John Kelvie from Bespoken about how “domains” will replace voice apps. It’s an interesting discussion – albeit a little hard to follow at times – so I recommend reading John’s blog about this concept before you listen to the podcast. Here’s an excerpt from the blog:
If you want to talk to someone, call your Mom. If you want to build a voice experience:
– One-shot is preferred
– Where one-shot is impossible, quick, contextual follow-ups are the next best option
– As a last resort, attempt an extended multi-turn dialog
Is this to say that extended interactions are a complete failure? Absolutely not! But describing them as conversations misses the point. How are they not conversations?
– They are not open-ended
– They lack important context – such as body language, intonation, emphasis, and past interactions
– They have very poor understanding – both in terms of speech and intent recognition
All of these things are likely to improve radically over a five to ten-year time horizon. But 12-18 months? Not so much. Almost certainly not enough to change what is feasible for most implementers.
RAIN has posted its list of predictions for this year – including this one:
On the heels of Beeb, Erica, and Hey Mercedes, 2020 will see brands in many industries seeking more control over their voice assistant footprint – spanning data and the customer experience – in the form of creating “owned” voice assistants in their brand’s image. There will be another set of major brands – from automotive to consumer electronics, financial services to QSR – that introduce their own voice agents, with their own personas and voices, in the year to come.
The Voice Interoperability Initiative will begin to connect these more disparate, specialist assistants with more generalist intelligences like Alexa, so as to make them more useful in more places.
In this podcast, Voicebot.ai’s Bret Kinsella talks with Maarten Lens-FitzGerald (the “Dutch Cowboy”) (at the 38:40 mark) to get the perspective of how voice will fare in 2020. Here are some of the points made:
1. Last January, Maarten predicted that 2019 would be the ‘year of boredom’ in voice. Other than the negative stories about privacy (or lack thereof), Martin’s prediction became fairly true.
2. Maarten talked about two types of confusion – for users and for organizations.
3. For users, there will be confusion because of three things: 1. the “walled garden” where each major voice provider has their own ecosystem that isn’t necessarily compatible with others; 2. existing voice users go “deeper” with their voice experiences and getting beyond playing music and checking the weather can sometimes lead to failed experiences; and 3. the laggards to voice will tend to be those that are less tech savvy and will need better instruction.
4. Before the Web in the mid-’90s, walled gardens existed in the online world (eg. bulletin boards), but the birth of browsers broke that down. Walled gardens still exist for mobile. Walled gardens for voice not likely to break down anytime soon because the major players have invested billions and don’t have an incentive to collaborate on unifying standards. But users are beginning to break down that wall a little bit (eg. using Amazon’s Echo Buds with their Apple iPhone). Still too early for most users to live with just one major player’s ecosystem.
5. Organizations are confused about the ROI for voice. So its a belief-based technology right now. You’ll need someone convincing the boss with creative numbers and good storytelling. Too hard to tell yet what voice works best for – is it customer service? Content? We don’t know yet. In a way, voice faces the same type of challenge that augmented reality does.
I learned a lot about the state of hearables in this Voicebot.ai podcast hosted by Bret Kinsella with Dave Kemp & Andy Bellavia. Here’s some things I learned:
1. There will likely be a tech disruption to the traditional high-end headphone market. Will Bose, etc. get bought? Or live on to be just a high-end niche player in the intelligible hearable space? It is likely that the high-end headphone companies will take much market share from the leaders in intelligible hearables (which already have pretty nice fidelity).
2. Apple will likely continue to dominate the IOS echosystem as Apple’s Airbuds have a very high satisfaction rating from customers (98%). However, the Android echosystem is more wide open and
3. So far, air buds pretty much get used the same way that smart speakers are used (egs. phone calls, texts, music, podcasts, audiobooks, setting alarms & timers). How will that vary going forward? At the 49-minute mark, there is a good discussion about how geolocation offers opportunities for hearables such as in-store purchases (eg. in a retail store, can tell you where to go for a specific product; can help upsell or cross-sell) or catching a train. There will be more interactivity with apps using hearables.
4. At the 54-minute mark, Bret notes that only 20% of Airbud users have used the voice assistant feature in them. That surely will change soon enough as they become more comfortable with using voice and they then explore new modalities that voice offers.
A few years ago, I was in Tokyo and saw a Jibo robot. It was freaky. This Voicebot.ai article describes Samsung’s new robot. It’s a ball that can follow you around – as opposed to prior robots that were stationary – and interact with you. It has more functionality than prior home-oriented robots, as this excerpt illustrates:
Samsung is also looking beyond simple request-and-response interactions of earlier social robots and smart speakers. Demonstrations of Ballie show it identifying problems in the home such as spilled food and a tipped over plant. In both cases, Ballie proactively called a robot vacuum or air purifying system to the location of the problem. There was no requirement for the homeowner to take action or even know an issue had taken place.
Another example involved Ballie observing a demonstrator watering a household plant. The woman’s task list for that day included, “Water Plants” and Ballie automatically checked off that task as complete. These are examples of virtual assistants with agency. That means they are granted authority to take actions on behalf of the user even without an explicit command. Google Duplex and the forthcoming Ring Doorbell Concierge are other early examples of voice assistants with agency. This is clearly a feature set that the leading voice assistant providers assume will be important and beneficial to users.
Voicebot.ai’s Bret Kinsella has put together a great collection of predictions for this year from 46 of the largest names in voice. Here are a few of my favorites:
– SoundHound’s Katie McMahon: The love affair we have with hardware design will migrate to a love affair of Voice Interface Design. Although I doubt “a Jony Ive of Voice” will emerge within 2020, I predict that by the end of this decade, we will know the names of a few revered VUI designers. It will be those who can design the future by understanding both its current technical limitations and trajectory while harnessing anthropological, sociological, and humanity-first guiding principles.
– Viv Lab/Samsung’s Roger Kibbe: In 2020, having a voice presence will start to be a strategic and business differentiator for companies. We are moving beyond voice as a side innovation project to it being a first-class citizen on the same level as social, mobile and web. Companies who have or will soon establish a voice presence will start to reap the business benefits over laggards, much like what happened with the web and mobile.
– SimpliSpoken’s Mark Phillips: Discoverability is the key issue holding the ecosystem back from realizing the potential that voice experiences offer. Even with the encouraging market penetration of voice platforms, consumers are for the most part unaware of what voice can do. I do not believe voice platform vendors, voice experience developers, or businesses can solve this problem in isolation. I predict an independent third party will attack this issue with a platform that brings together consumers, vendors, developers, and businesses to provide shared value and incentives to cross the chasm.
– Bespoken’s John Kelvie: The rise of a new domain-centric development model for third-parties. The initial wave of voice was based around an app-centric model. This made sense, as the analogies and onramps for developers coming from mobile and web were so easy to make. But domains make more sense for users. Domains are top-level intents with third-party fulfillment. It also means that users are defining functional boundaries, not developers or product designers. Finally, it means discovery is moot. Forget about tricks and gambits to make users memorize and chant invocation names. Instead, builders must discover users where they are, in users’ natural expressions and requests.
To effect this expeditiously, the platforms must provide a way to bring third-parties in on top-level intents fairly and transparently. And third-parties must take users as they come – with a myriad of queries and commands that may not fit neatly into their existing app-centric way of thinking.
– Voice Metrics’ Stuart Crane: One of the breakout hits in the voice space for 2020 and beyond will be voice-activated rings, starting with the Echo Loop. Alexa users will appreciate the ability to use Alexa anytime, anywhere without having to have a smart speaker nearby, or headphones in their ears. I predict that in 2020, Apple will take notice of this new “category” (voice-activated rings), and begin developing a Siri-compatible ring, which, sometime after 2020, will become an even bigger hit than the Echo Loop.
– Altbrains Workshop’s Fred Zimmerman: There will be a “broadcast to the world” event where a voice agent talks to everyone at the same time. It may be planned or it may be an accident–Google issues an emergency notification about a global threat; Jeff Bezos issues a personal message the day before the election; Siri gets hacked — who knows! And it may or may not contain an interactive element where the system is able to act effectively on the hundreds of millions of responses it will receive. But, it will illustrate voice’s power to touch everyone at an emotional level at the same moment — a bit like Orson Welles’ War of the Worlds and radio. I may be early — this may not occur in 2020, but later — but it is coming.
Here’s an excerpt from Dave Kemp’s take – as reported for “voicebot.ai” – on Apple’s latest earnings report, which shows how Apple increasingly is relying on hearables for growth:
The narrative that surrounds Apple has changed recently from being centered around the iPhone’s sales growth to a narrative more focused around wearables and services. This new narrative, which I wrote about in Voicebot after Apple’s previous earnings report, has become even more pronounced this quarter as Apple’s revenue becomes less dependent on the iPhone and continues migrating toward wearables and services. For the first time in 7 years, more than half of Apple’s revenue came from products outside of the iPhone.
Apple lumps AirPods, Watch, HomePod, Beats, and Apple TV into its “wearables, home and accessories” category (among a few other small products). It’s hard to know how much each product within the category is contributing to the total number, but given Tim Cook’s propensity to broadly refer to the category as, “wearables,” HomePod’s lackluster penetration into the smart speaker market, and the lack of attention Apple has put toward Apple TV’s innovation and marketing over recent years, it seems likely that the vast majority of this category’s revenue is tied specifically to the wearables (Watch, AirPods, Beats).
Here’s the intro from this article from the “Artificial Lawyer”:
Today, the UK Government’s National Health Service (NHS) announced that citizens will be able to get health ‘advice’ via the Amazon Alexa home voice system – in effect an AI system that is learning on the job. And this got Artificial Lawyer thinking: what if governments around the world did this for legal?
Now, the idea of using voice systems to provide legal advice is not new – it’s been a mainstay of hackathons the world over. We have also seen legal bot developers such as LawDroid build A2J versions of voice systems. And we have seen universities and law schools in the US, such as Suffolk Law, get to work on the foundations of what could become such systems by identifying key legal questions that consumers ask online.
But, what if this idea was picked up and developed on a national scale, for example, by the UK’s Ministry of Justice, or the US Department of Justice?