A layperson’s exploration of all things voice

Category Archives: Building Skills

May 25, 2021

Analyzing Customer Interactions to Glean Customer Preferences

Here’s a teaser from the “Rain” agency about their weekly note:

Millions of calls are answered in call centers per day, generating conversations rife with insights into customer behaviors and preferences. This week, we take a look into how companies are analyzing conversations to enhance the customer experience. Voice technology companies like Observe.AI and CallMiner are deploying tools that analyze the interactions (including sentiment and even silences) between customers and call center representatives, providing recommendations on how these employees can improve their service.

With information on what a customer is searching for, brands like Spotify and Amazon are hoping to personalize content in real time — setting the stage for how conversation analysis might be used in phone calls, voice experiences, and more to elevate marketing.

January 29, 2021

Wow! Alexa Now Can Infer Your Intent

As part of Amazon’s long-term goal to make talking to Alexa more natural, a new “infer your intent” functionality has been built. Here’s an excerpt from this article from the “Verge”:

Finding new ways to use Amazon’s Alexa has always been a bit of a pain. Amazon boasts that its AI assistant has more than 100,000 skills, but most are garbage and the useful ones are far from easy to discover. Today, though, Amazon announced it’s launched a new way to surface skills: by guessing what users are after when they talk to Alexa about other tasks.

The company refers to this process as “[inferring] customers’ latent goals.” By this, it means working out any questions that are implied by other queries. Amazon gives the example of a customer asking “How long does it take to steep tea?” to which Alexa will answer “five minutes” before asking the follow-up: ”Would you like me to set a timer for five minutes?”

November 17, 2020

Tackling the Voice App Discovery Problem

I used to blog more about the challenges of building skills – and how to make it easier for skills to be discovered by folks once you launch them. Here’s a nice piece about skill discovery from voicebot.ai, along with this excerpt:

Zevenbergen’s good fortune to rise to the top of the search results for “What’s my horoscope?” would not be wasted. He had already built user retention elements into his Google Action. First, Zevenbergen wanted to fulfill the intent of the user very efficiently. He had a goal of giving the horoscope as quickly as possible. For new users that simply required determining their birthday. Not only was there a target of delivering the full horoscope within 10 seconds, the Action tells new users that they will receive their horoscope within 10 seconds. It sets expectations and removes a potential concern about how much the user may be committing to with this particular voice experience.

Second, he found that shorter, more concise horoscopes were leading to more completed sessions. There may be an opportunity to convey many paragraphs worth of horoscope goodness but that’s often the opposite of what people want when interacting on a smart speaker. They want the facts. Ensuring users heard the entire horoscope before abandoning the session also gave him a captive audience that was still around when the Action offered to add “What’s my zodiac sign” to a routine or notification. “What’s my zodiac sign” is now getting about 5,000 opened notifications from Google Assistant each day. If you compare that to the DAUs for the Action you will conclude that nearly 85% of daily user sessions are driven by this single technique.

May 14, 2020

People Have Asked to Set Their Alarms in 3000 Different Ways

This Voicebot.ai podcast provides ten short interviews from the CES conference. At the 22:01 mark, Bret talks to XAPPmedia’s Pat Higbie who discusses how speaking to voice apps is so much different than a human-to-human conversation. Among Pat’s comments were these:

– According to a panelist at CES, there are 3000 ways that people have asked to set alarms. So it’s difficult to predict how humans will ask for even a simple function to be performed.
– With voice, you are giving a simple command for an area that has a complex syntax
– Every time someone tweaks their voice app to accommodate new ways that human can ask for something, you run the risk that break what you’ve built. Some you have to be mindful of your existing syntax,
– Right now, there’s a lot of information out there about good design but not a lot about the engineering necessary to pull it off. In essence, there currently is a lack of engineering talent that knows how to deal with complex syntax
– Multimodal use of voice is rising and there’s a lot of work still ahead for that too. Providers will have to account for those using screens – and those not using them – when they design.

April 21, 2020

Your Bad Memory Limits Voice’s Possibilities (At Least With Today’s Tech)

Recently, I blogged about this podcast, in which Voicebot.ai’s Bret Kinsella talks with John Kelvie from Bespoken about how “domains” will replace voice apps. I enjoyed John’s blog about this concept so much that I wanted to excerpt again from the blog:

Most of what is written above hinges on just a couple of key observations:

– Users do not remember invocation names

– Multi-turn dialogs sort-of work – in some cases they are useful and appropriate. But for the most part they annoy users and should be avoided.

If you accept these observations, everything else I’ve laid out follows fairly naturally. Of course, someone might come up with (or perhaps unbeknownst to me, already has) how to (a) improve users’ memories (b) remind them of phrases and experiences without annoying the love out of them, and/or (c) miraculously, markedly improve the state of the art of speech recognition. But assuming none of the above occur in the next 12-18 months, I believe most of what I have written is inevitable. At least, it is if we want to have a vibrant ecosystem for third parties.

March 19, 2020

The NFL’s Push Into Voice

In this podcast, Voicebot.ai’s Bret Kinsella talks with NFL Labs’ Ian Campbell and Bondad.fm’s John Gillilan about how the NFL has embraced voice. The topics included:

– Goal is engaging with fans on multiple channels since fan expectations are higher nowadays as many are tech savvy.
– The NFL’s partners also expect more and the benefit to the NFL is additional product integration opportunities.
– Started with a lot of small prototypes in voice. Started with an Alexa Skill (‘Rookie’s Guide to the NFL’) last offseason. The skill teaches new fans the rules, including an international audience (games now played in London and Mexico City). Most of their voice endeavors so far have been on Alexa, but they do have some content on the Google Assistant too.
– You need to rethink your content for a voice platform. Can’t write for voice in a vacuum, you need to hear how it sounds – so how you spell things matters as its part of your personality, what type of music behind the voice matters, etc. So it’s more than just scripting.
– Voice brings a lot of truths to your content. For example, for the ‘Rookie’s Guide” skill, they had to consider how to explain the jargon and commentary that accompanies the rules. There is a unique language & nomenclature exists for every industry.
– So far, the NFL has done four types of Flash Briefings: Definitions, News, Editorials, Quizzes, Games & Storytelling.
– Used both synthetic Polly voice (the one offered by Amazon called “Matthew”) and a real player (Maurice Jones-Drew) and a sportscaster (Cole Wright). They are looking at VocalID’s service too. They have tried proto personas to see what works – and if it works, they build on that.
– They tried an avatar of ‘Football Frank,’ which used the Polly voice of Matthew.
– They spend a lot of time trying to help fans get back on track if they make a request that “fails” – they do that with some humor to lessen the blow of a failure.
– They have a multimodal project that is just internal now. They use a ‘hear, see, do’ principle to try to adjust to the differences from voice-only to a screen addition.

March 10, 2020

The Early Days: How Samsung’s Bixby is Shaping Up

This VoiceFirst.fm podcast hosted by Bradley Metrock with three evangelists from Samsung’s Bixby explores where Bixby is headed. Here are a few nuggets:

1. The ability of Samsung televisions (and other Samsung appliances) to offer voice assistant help can be a differentor down the road. For example, you’re watching a football game and a “clipping” penalty is called. You can ask the TV to explain what “clipping” is – and a graphic will pop up with the explanation.

2. Amazon struggles with discoverability issues since more than 100k skills are now in the library. Google’s challenge is that it only allows a limited number of third-parties to make Actions for its library. For Samsung, you can make a capsule and it will stand out since you’ll be a first-mover since Bixby is relatively new. Like Amazon, Samsung encourages third-parties to contribute capsules.

[For those new to voice, Amazon uses the term “Skill”; Google uses “Action”; and Samsung uses “Capsule” as their way of identifying the same thing – essentially an “app” but these things are played from a voice assistant rather than a mobile phone.]

3. When it comes to privacy, Bixby has the functionality for you to go back and delete any (or all) of your “utterances.” Meaning you can delete anything you asked Bixby to do.

March 2, 2020

The Mayo Clinic’s Voice Experience as a First Mover

This Voicebot.ai podcast with the Mayo Clinic’s Dr. Sandhya Pruthi and Joyce Even is interesting for those helping their organizations get into voice because the Mayo Clinic was a first mover and these speakers share some details about how they got started. The points include:

1. The Mayo Clinic is a content-driven organization. It was already involved in educating the public & medical staff through multiple mediums, including chat bots.

2. They started with a first aid skill to try it out. And since then, they’ve been constantly been building on that. They didn’t start with a concrete plan, just generally going with the flow. Taking content built for Web or print and converting it for voice is an art & science. Shorter answers required and the need to predict how a question will be asked.

3. Conducted a pilot where patients would be instructed by nurses after the doctor was done with them that they could ask a voice assistant about wound care upon discharge. An example of how you can use a patient’s “down time” when they are alone back in a room to get more educated about their condition. Highly successful from both the medical staff and patient’s perspective. Now they’re planning on rolling out a pilot for the emergency room.

4. The speakers noted that some patients are either loathe to ask their doctor certain questions (eg. they worry they would look stupid to ask or due to privacy concerns) or they forget their questions when the doctor comes in. Oftentimes, the family also has a lot of questions. The voice assistant can help with efficiency & education.

5. Amazon asked Mayo Clinic to provide first-party content (ie. content that is part of Alexa’s core; you don’t have to ask for Alexa to open a Mayo Clinic skill). That took some work to convert the third-party content they had developed into first person content.

6. A content team leads voice at the Mayo Clinic. Bret remarked that’s unusual as it typically is a team from marketing, product or IT.

7. The Mayo Clinic voice doesn’t have a persona. They eventually may have one – or maybe even multiple personas depending on the type of interaction (eg. audience is particular type of patients, their own doctors, etc.) – but it may be unnecessary and they won’t do that. Still early days.

8. The Mayo Clinic has a digital strategy that stretches out to 2030. A few possibilities about how voice may evolve are interactions with a voice app that is empathetic (eg. it will get to really know you & can cater to your needs); voice apps that are more proactive by reaching out & being more engaged (eg. “did you take your meds?”); and freeing up providers to be more efficient by dramatically cutting down on the four hrs they spend per day doing medical records today.