I used to blog more about the challenges of building skills – and how to make it easier for skills to be discovered by folks once you launch them. Here’s a nice piece about skill discovery from voicebot.ai, along with this excerpt:
Zevenbergen’s good fortune to rise to the top of the search results for “What’s my horoscope?” would not be wasted. He had already built user retention elements into his Google Action. First, Zevenbergen wanted to fulfill the intent of the user very efficiently. He had a goal of giving the horoscope as quickly as possible. For new users that simply required determining their birthday. Not only was there a target of delivering the full horoscope within 10 seconds, the Action tells new users that they will receive their horoscope within 10 seconds. It sets expectations and removes a potential concern about how much the user may be committing to with this particular voice experience.
Second, he found that shorter, more concise horoscopes were leading to more completed sessions. There may be an opportunity to convey many paragraphs worth of horoscope goodness but that’s often the opposite of what people want when interacting on a smart speaker. They want the facts. Ensuring users heard the entire horoscope before abandoning the session also gave him a captive audience that was still around when the Action offered to add “What’s my zodiac sign” to a routine or notification. “What’s my zodiac sign” is now getting about 5,000 opened notifications from Google Assistant each day. If you compare that to the DAUs for the Action you will conclude that nearly 85% of daily user sessions are driven by this single technique.
As noted in this announcement, Amazon has a new “Alexa Agency Curriculum,” that contains a bunch of resources for those interested in building skills – including voice strategy & design, development guidelines and launch considerations. Good news for anyone developing Alexa skills…
This Voicebot.ai podcast provides ten short interviews from the CES conference. At the 22:01 mark, Bret talks to XAPPmedia’s Pat Higbie who discusses how speaking to voice apps is so much different than a human-to-human conversation. Among Pat’s comments were these:
– According to a panelist at CES, there are 3000 ways that people have asked to set alarms. So it’s difficult to predict how humans will ask for even a simple function to be performed.
– With voice, you are giving a simple command for an area that has a complex syntax
– Every time someone tweaks their voice app to accommodate new ways that human can ask for something, you run the risk that break what you’ve built. Some you have to be mindful of your existing syntax,
– Right now, there’s a lot of information out there about good design but not a lot about the engineering necessary to pull it off. In essence, there currently is a lack of engineering talent that knows how to deal with complex syntax
– Multimodal use of voice is rising and there’s a lot of work still ahead for that too. Providers will have to account for those using screens – and those not using them – when they design.
Recently, I blogged about this podcast, in which Voicebot.ai’s Bret Kinsella talks with John Kelvie from Bespoken about how “domains” will replace voice apps. I enjoyed John’s blog about this concept so much that I wanted to excerpt again from the blog:
Most of what is written above hinges on just a couple of key observations:
– Users do not remember invocation names
– Multi-turn dialogs sort-of work – in some cases they are useful and appropriate. But for the most part they annoy users and should be avoided.
If you accept these observations, everything else I’ve laid out follows fairly naturally. Of course, someone might come up with (or perhaps unbeknownst to me, already has) how to (a) improve users’ memories (b) remind them of phrases and experiences without annoying the love out of them, and/or (c) miraculously, markedly improve the state of the art of speech recognition. But assuming none of the above occur in the next 12-18 months, I believe most of what I have written is inevitable. At least, it is if we want to have a vibrant ecosystem for third parties.
In this podcast, Voicebot.ai’s Bret Kinsella talks with NFL Labs’ Ian Campbell and Bondad.fm’s John Gillilan about how the NFL has embraced voice. The topics included:
– Goal is engaging with fans on multiple channels since fan expectations are higher nowadays as many are tech savvy.
– The NFL’s partners also expect more and the benefit to the NFL is additional product integration opportunities.
– Started with a lot of small prototypes in voice. Started with an Alexa Skill (‘Rookie’s Guide to the NFL’) last offseason. The skill teaches new fans the rules, including an international audience (games now played in London and Mexico City). Most of their voice endeavors so far have been on Alexa, but they do have some content on the Google Assistant too.
– You need to rethink your content for a voice platform. Can’t write for voice in a vacuum, you need to hear how it sounds – so how you spell things matters as its part of your personality, what type of music behind the voice matters, etc. So it’s more than just scripting.
– Voice brings a lot of truths to your content. For example, for the ‘Rookie’s Guide” skill, they had to consider how to explain the jargon and commentary that accompanies the rules. There is a unique language & nomenclature exists for every industry.
– So far, the NFL has done four types of Flash Briefings: Definitions, News, Editorials, Quizzes, Games & Storytelling.
– Used both synthetic Polly voice (the one offered by Amazon called “Matthew”) and a real player (Maurice Jones-Drew) and a sportscaster (Cole Wright). They are looking at VocalID’s service too. They have tried proto personas to see what works – and if it works, they build on that.
– They tried an avatar of ‘Football Frank,’ which used the Polly voice of Matthew.
– They spend a lot of time trying to help fans get back on track if they make a request that “fails” – they do that with some humor to lessen the blow of a failure.
– They have a multimodal project that is just internal now. They use a ‘hear, see, do’ principle to try to adjust to the differences from voice-only to a screen addition.
This VoiceFirst.fm podcast hosted by Bradley Metrock with three evangelists from Samsung’s Bixby explores where Bixby is headed. Here are a few nuggets:
1. The ability of Samsung televisions (and other Samsung appliances) to offer voice assistant help can be a differentor down the road. For example, you’re watching a football game and a “clipping” penalty is called. You can ask the TV to explain what “clipping” is – and a graphic will pop up with the explanation.
2. Amazon struggles with discoverability issues since more than 100k skills are now in the library. Google’s challenge is that it only allows a limited number of third-parties to make Actions for its library. For Samsung, you can make a capsule and it will stand out since you’ll be a first-mover since Bixby is relatively new. Like Amazon, Samsung encourages third-parties to contribute capsules.
[For those new to voice, Amazon uses the term “Skill”; Google uses “Action”; and Samsung uses “Capsule” as their way of identifying the same thing – essentially an “app” but these things are played from a voice assistant rather than a mobile phone.]
3. When it comes to privacy, Bixby has the functionality for you to go back and delete any (or all) of your “utterances.” Meaning you can delete anything you asked Bixby to do.
This Voicebot.ai podcast with the Mayo Clinic’s Dr. Sandhya Pruthi and Joyce Even is interesting for those helping their organizations get into voice because the Mayo Clinic was a first mover and these speakers share some details about how they got started. The points include:
1. The Mayo Clinic is a content-driven organization. It was already involved in educating the public & medical staff through multiple mediums, including chat bots.
2. They started with a first aid skill to try it out. And since then, they’ve been constantly been building on that. They didn’t start with a concrete plan, just generally going with the flow. Taking content built for Web or print and converting it for voice is an art & science. Shorter answers required and the need to predict how a question will be asked.
3. Conducted a pilot where patients would be instructed by nurses after the doctor was done with them that they could ask a voice assistant about wound care upon discharge. An example of how you can use a patient’s “down time” when they are alone back in a room to get more educated about their condition. Highly successful from both the medical staff and patient’s perspective. Now they’re planning on rolling out a pilot for the emergency room.
4. The speakers noted that some patients are either loathe to ask their doctor certain questions (eg. they worry they would look stupid to ask or due to privacy concerns) or they forget their questions when the doctor comes in. Oftentimes, the family also has a lot of questions. The voice assistant can help with efficiency & education.
5. Amazon asked Mayo Clinic to provide first-party content (ie. content that is part of Alexa’s core; you don’t have to ask for Alexa to open a Mayo Clinic skill). That took some work to convert the third-party content they had developed into first person content.
6. A content team leads voice at the Mayo Clinic. Bret remarked that’s unusual as it typically is a team from marketing, product or IT.
7. The Mayo Clinic voice doesn’t have a persona. They eventually may have one – or maybe even multiple personas depending on the type of interaction (eg. audience is particular type of patients, their own doctors, etc.) – but it may be unnecessary and they won’t do that. Still early days.
8. The Mayo Clinic has a digital strategy that stretches out to 2030. A few possibilities about how voice may evolve are interactions with a voice app that is empathetic (eg. it will get to really know you & can cater to your needs); voice apps that are more proactive by reaching out & being more engaged (eg. “did you take your meds?”); and freeing up providers to be more efficient by dramatically cutting down on the four hrs they spend per day doing medical records today.
Hat tip to Witlingo’s Ahmed Bouzid for turning me onto a fantastic month-long webinar series about Alexa Flash Briefings from Vixen Labs’ Suze Cooper and BBC’s Peter Stewart. They have been posting a new video each day during the month of February explaining how to best produce a Flash Briefing. The topics they cover range widely and include:
– How to come up with your show name
– How long should your episode titles be
– Don’t bother with show notes
– How to stake your claim in a crowded audio field
– Why it matters that Google & Spotify have joined the personalized audio content revolution
This Voicebot.ai podcast provides ten short interviews from the CES conference. At the 44:18 mark, Bret talks to Rain’s Nithya Thadani who discusses how voice is changing not just consumer behaviors, but employer-employee relationships. Among Nithya’s comments were these:
– Voice can be used for training & other ways to improve worker efficiencies.
– Bret talked about the consumerization of IT
– Enterprise is growing as companies continue to look at removing inefficiencies. Nithya talked about observing employees who have developed “hacks” – unexpected odd behaviors – in an effort to get around obstacles. And how these hacks can be rendered unnecessary to improve worker effectiveness.
– Interestingly, it seems like Rain has a few clients in the mindfulness industry – including HeadSpace which is looking a voice that knows that you’re on the move and recommends listening to a walking mediation talk.
In this podcast, Voicebot.ai’s Bret Kinsella talks with John Kelvie from Bespoken about how “domains” will replace voice apps. It’s an interesting discussion – albeit a little hard to follow at times – so I recommend reading John’s blog about this concept before you listen to the podcast. Here’s an excerpt from the blog:
If you want to talk to someone, call your Mom. If you want to build a voice experience:
– One-shot is preferred
– Where one-shot is impossible, quick, contextual follow-ups are the next best option
– As a last resort, attempt an extended multi-turn dialog
Is this to say that extended interactions are a complete failure? Absolutely not! But describing them as conversations misses the point. How are they not conversations?
– They are not open-ended
– They lack important context – such as body language, intonation, emphasis, and past interactions
– They have very poor understanding – both in terms of speech and intent recognition
All of these things are likely to improve radically over a five to ten-year time horizon. But 12-18 months? Not so much. Almost certainly not enough to change what is feasible for most implementers.