A layperson’s exploration of all things voice

Category Archives: Building Skills

May 14, 2020

People Have Asked to Set Their Alarms in 3000 Different Ways

This Voicebot.ai podcast provides ten short interviews from the CES conference. At the 22:01 mark, Bret talks to XAPPmedia’s Pat Higbie who discusses how speaking to voice apps is so much different than a human-to-human conversation. Among Pat’s comments were these:

– According to a panelist at CES, there are 3000 ways that people have asked to set alarms. So it’s difficult to predict how humans will ask for even a simple function to be performed.
– With voice, you are giving a simple command for an area that has a complex syntax
– Every time someone tweaks their voice app to accommodate new ways that human can ask for something, you run the risk that break what you’ve built. Some you have to be mindful of your existing syntax,
– Right now, there’s a lot of information out there about good design but not a lot about the engineering necessary to pull it off. In essence, there currently is a lack of engineering talent that knows how to deal with complex syntax
– Multimodal use of voice is rising and there’s a lot of work still ahead for that too. Providers will have to account for those using screens – and those not using them – when they design.

April 21, 2020

Your Bad Memory Limits Voice’s Possibilities (At Least With Today’s Tech)

Recently, I blogged about this podcast, in which Voicebot.ai’s Bret Kinsella talks with John Kelvie from Bespoken about how “domains” will replace voice apps. I enjoyed John’s blog about this concept so much that I wanted to excerpt again from the blog:

Most of what is written above hinges on just a couple of key observations:

– Users do not remember invocation names

– Multi-turn dialogs sort-of work – in some cases they are useful and appropriate. But for the most part they annoy users and should be avoided.

If you accept these observations, everything else I’ve laid out follows fairly naturally. Of course, someone might come up with (or perhaps unbeknownst to me, already has) how to (a) improve users’ memories (b) remind them of phrases and experiences without annoying the love out of them, and/or (c) miraculously, markedly improve the state of the art of speech recognition. But assuming none of the above occur in the next 12-18 months, I believe most of what I have written is inevitable. At least, it is if we want to have a vibrant ecosystem for third parties.

March 19, 2020

The NFL’s Push Into Voice

In this podcast, Voicebot.ai’s Bret Kinsella talks with NFL Labs’ Ian Campbell and Bondad.fm’s John Gillilan about how the NFL has embraced voice. The topics included:

– Goal is engaging with fans on multiple channels since fan expectations are higher nowadays as many are tech savvy.
– The NFL’s partners also expect more and the benefit to the NFL is additional product integration opportunities.
– Started with a lot of small prototypes in voice. Started with an Alexa Skill (‘Rookie’s Guide to the NFL’) last offseason. The skill teaches new fans the rules, including an international audience (games now played in London and Mexico City). Most of their voice endeavors so far have been on Alexa, but they do have some content on the Google Assistant too.
– You need to rethink your content for a voice platform. Can’t write for voice in a vacuum, you need to hear how it sounds – so how you spell things matters as its part of your personality, what type of music behind the voice matters, etc. So it’s more than just scripting.
– Voice brings a lot of truths to your content. For example, for the ‘Rookie’s Guide” skill, they had to consider how to explain the jargon and commentary that accompanies the rules. There is a unique language & nomenclature exists for every industry.
– So far, the NFL has done four types of Flash Briefings: Definitions, News, Editorials, Quizzes, Games & Storytelling.
– Used both synthetic Polly voice (the one offered by Amazon called “Matthew”) and a real player (Maurice Jones-Drew) and a sportscaster (Cole Wright). They are looking at VocalID’s service too. They have tried proto personas to see what works – and if it works, they build on that.
– They tried an avatar of ‘Football Frank,’ which used the Polly voice of Matthew.
– They spend a lot of time trying to help fans get back on track if they make a request that “fails” – they do that with some humor to lessen the blow of a failure.
– They have a multimodal project that is just internal now. They use a ‘hear, see, do’ principle to try to adjust to the differences from voice-only to a screen addition.

March 10, 2020

The Early Days: How Samsung’s Bixby is Shaping Up

This VoiceFirst.fm podcast hosted by Bradley Metrock with three evangelists from Samsung’s Bixby explores where Bixby is headed. Here are a few nuggets:

1. The ability of Samsung televisions (and other Samsung appliances) to offer voice assistant help can be a differentor down the road. For example, you’re watching a football game and a “clipping” penalty is called. You can ask the TV to explain what “clipping” is – and a graphic will pop up with the explanation.

2. Amazon struggles with discoverability issues since more than 100k skills are now in the library. Google’s challenge is that it only allows a limited number of third-parties to make Actions for its library. For Samsung, you can make a capsule and it will stand out since you’ll be a first-mover since Bixby is relatively new. Like Amazon, Samsung encourages third-parties to contribute capsules.

[For those new to voice, Amazon uses the term “Skill”; Google uses “Action”; and Samsung uses “Capsule” as their way of identifying the same thing – essentially an “app” but these things are played from a voice assistant rather than a mobile phone.]

3. When it comes to privacy, Bixby has the functionality for you to go back and delete any (or all) of your “utterances.” Meaning you can delete anything you asked Bixby to do.

March 2, 2020

The Mayo Clinic’s Voice Experience as a First Mover

This Voicebot.ai podcast with the Mayo Clinic’s Dr. Sandhya Pruthi and Joyce Even is interesting for those helping their organizations get into voice because the Mayo Clinic was a first mover and these speakers share some details about how they got started. The points include:

1. The Mayo Clinic is a content-driven organization. It was already involved in educating the public & medical staff through multiple mediums, including chat bots.

2. They started with a first aid skill to try it out. And since then, they’ve been constantly been building on that. They didn’t start with a concrete plan, just generally going with the flow. Taking content built for Web or print and converting it for voice is an art & science. Shorter answers required and the need to predict how a question will be asked.

3. Conducted a pilot where patients would be instructed by nurses after the doctor was done with them that they could ask a voice assistant about wound care upon discharge. An example of how you can use a patient’s “down time” when they are alone back in a room to get more educated about their condition. Highly successful from both the medical staff and patient’s perspective. Now they’re planning on rolling out a pilot for the emergency room.

4. The speakers noted that some patients are either loathe to ask their doctor certain questions (eg. they worry they would look stupid to ask or due to privacy concerns) or they forget their questions when the doctor comes in. Oftentimes, the family also has a lot of questions. The voice assistant can help with efficiency & education.

5. Amazon asked Mayo Clinic to provide first-party content (ie. content that is part of Alexa’s core; you don’t have to ask for Alexa to open a Mayo Clinic skill). That took some work to convert the third-party content they had developed into first person content.

6. A content team leads voice at the Mayo Clinic. Bret remarked that’s unusual as it typically is a team from marketing, product or IT.

7. The Mayo Clinic voice doesn’t have a persona. They eventually may have one – or maybe even multiple personas depending on the type of interaction (eg. audience is particular type of patients, their own doctors, etc.) – but it may be unnecessary and they won’t do that. Still early days.

8. The Mayo Clinic has a digital strategy that stretches out to 2030. A few possibilities about how voice may evolve are interactions with a voice app that is empathetic (eg. it will get to really know you & can cater to your needs); voice apps that are more proactive by reaching out & being more engaged (eg. “did you take your meds?”); and freeing up providers to be more efficient by dramatically cutting down on the four hrs they spend per day doing medical records today.

February 27, 2020

Practice Pointers for Alexa Flash Briefings

Hat tip to Witlingo’s Ahmed Bouzid for turning me onto a fantastic month-long webinar series about Alexa Flash Briefings from Vixen Labs’ Suze Cooper and BBC’s Peter Stewart. They have been posting a new video each day during the month of February explaining how to best produce a Flash Briefing. The topics they cover range widely and include:

– How to come up with your show name

– How long should your episode titles be

– Don’t bother with show notes

– How to stake your claim in a crowded audio field

– Why it matters that Google & Spotify have joined the personalized audio content revolution

February 26, 2020

Using Voice to Improve Employee Productivity

This Voicebot.ai podcast provides ten short interviews from the CES conference. At the 44:18 mark, Bret talks to Rain’s Nithya Thadani who discusses how voice is changing not just consumer behaviors, but employer-employee relationships. Among Nithya’s comments were these:

– Voice can be used for training & other ways to improve worker efficiencies.
– Bret talked about the consumerization of IT
– Enterprise is growing as companies continue to look at removing inefficiencies. Nithya talked about observing employees who have developed “hacks” – unexpected odd behaviors – in an effort to get around obstacles. And how these hacks can be rendered unnecessary to improve worker effectiveness.
– Interestingly, it seems like Rain has a few clients in the mindfulness industry – including HeadSpace which is looking a voice that knows that you’re on the move and recommends listening to a walking mediation talk.

February 14, 2020

How “Voice” Isn’t Anywhere Near Ready to Be “Conversations”

In this podcast, Voicebot.ai’s Bret Kinsella talks with John Kelvie from Bespoken about how “domains” will replace voice apps. It’s an interesting discussion – albeit a little hard to follow at times – so I recommend reading John’s blog about this concept before you listen to the podcast. Here’s an excerpt from the blog:

If you want to talk to someone, call your Mom. If you want to build a voice experience:

– One-shot is preferred
– Where one-shot is impossible, quick, contextual follow-ups are the next best option
– As a last resort, attempt an extended multi-turn dialog

Is this to say that extended interactions are a complete failure? Absolutely not! But describing them as conversations misses the point. How are they not conversations?

– They are not open-ended
– They lack important context – such as body language, intonation, emphasis, and past interactions
– They have very poor understanding – both in terms of speech and intent recognition

All of these things are likely to improve radically over a five to ten-year time horizon. But 12-18 months? Not so much. Almost certainly not enough to change what is feasible for most implementers.

February 12, 2020

Food Network’s Push Into Voice

In this podcast, Voicebot.ai’s Bret Kinsella talks with Tim McElreath from Discovery and Food Network (at the 25-minute mark about the “deapplication” of Alexa Skills and voice assistants). Tim describes how the Food Network has multiple channels of content – which are all interconnected – and how that poses a challenge for his company using voice. He notes he’s in discussions with Amazon about how to unpack all of this. Right now, the arrangement they have with Amazon is a hybrid of the traditional voice model – for example, they offer live cooking classes and a customer can use an Alexa Show to see the schedule of classes visually.

Tim explains his prediction of “deapplication” so that brands may keep their first-party position and also orchestrate what they do. Tim notes that “deapplication” applies mainly to those that have a mix of content types like his company does (ie. different mediums and platforms). But skills themselves are still valuable as are self-contained solutions. This may sound a little confusing…it is…