Many companies pay lip service about how their brands have a personality or persona. With voice, that is literally true because companies will need to decide on the voice’s gender, the accent, the level of sophistication, the tone – and more. It’s a true litmus test for how seriously companies take their brand.
So with voice, your brand has a tone. As Susan Westwater discusses in her “Pragmatic Talk” podcast (episode 4), you need to define your brand – or it will be defined for you. Not just the words uttered, the actual tone.
For some, voice assistants scare them due to privacy concerns. I get that. But in my opinion, we already have lost the war when it comes to privacy. Even if you are paranoid enough to go “off-the-grid,” we still know where you are. And let’s face it, going off-the-grid kinda takes the fun out of life. But maybe there is hope after all – this “voicebot.ai” podcast with Vijay Balasubramaniyan, the founder & CEO of Pindrop Security blew my mind.
Vijay explains how audio is so rich that you can determine the source of a phone call from just the “audio characteristics.” These audio characteristics include loss (ie. just a few milliseconds of breaks in speech), noise (ie. delay in providing the speech & the background sounds) & frequency . Pindrop Security’s core technologies use:
1. Deep voice – who you are based on your voice
2. Phone printing – what you have based on the device that you use to speak
3. Behavior printing – what you do based on your behavior
Why should you care? Because identify theft is rampant. And because your providers pepper you with an increasingly set of complex questions to confirm who you are when you contact them. What if both of these issues could be solved? That would be nice, right?
Pindrop Security sells their unique services to your providers – banks, insurance & credit card companies – so that they can authenticate that you are who you say you are based on just the first few sentences you utter when you place a call to them. This Forbes article describes it pretty well. Here’s an excerpt:
The release of Pindrop Passport, a new authentication tool that the startup unveiled on Thursday, is a big first step. Passport scans a caller’s voice, behavior (how they press their phone to input a pin) and the phone’s signature (whether it’s rerouted or from the wrong geography) to come up with risk scores in the time it takes to speak a sentence, the company says. That quick verification process not only reduces fraud but can shave as much as 55 seconds off the average call – equivalent to $1 saved in employee time, and potentially millions annually.
And while technology exists to artificially reproduce someone’s voice, Pindrop Security’s system can distinguish when a voice has been synthetically stitched together due to limits in our ability to simulate a person’s exact vocal chords…
As noted in this “voicebot.ai” piece, new analysis from Score Publishing estimates that NY Times bestselling authors/publishers will lose $17 million this year in sales because of poor voice assistant search recognition. Score Publishing assumed that only about 20% of failed queries led to a lost sale – the $17 million could be higher/lower if that assumption under/overestimates how determined someone is to make the purchase.
Here’s other facts from the article:
– Voice assistants overall only answered 43.1% of the queries but that figure rose to 55% when the toughest of the four questions was removed
– Google Assistant was the top performer successfully answering 72.5% of the queries
– Microsoft Cortana and Amazon Alexa followed with 60.8% and 44.2% respectively
When you initially set up a new voice assistant, you have the opportunity to set up a “voice profile.” I remember when speech recognition was first available publicly, you had to spend 3-6 months with the program to get it to learn your voice (see this article). That was years ago.
Not true anymore. Now it just takes Amazon’s Alexa or Google’s Voice Assistant just a few minutes (here’s Amazon’s instructions). And if you will be sharing your assistant, you can teach it to learn the voices of others by having them sent up a voice profile too. [This functionality was created by Amazon a few years back, as noted in this article.]
But what if someone comes along and uses your assistant without creating a voice profile? My experience is that the device will likely work just fine. Speech recognition is so advanced these days that it can be successfully used without training it to recognize your voice. Pretty amazing. Of course, if accents vary – setting up a new voice profile may be necessary. But even different genders hasn’t seemed to through my devices off…
Brushing your teeth is important. For parents with young kids, that can be one of the biggest challenges of the day. Enter “Chompers” (“Alexa, start Chompers!”) from Gimlet Media. As seen in this two-minute promo clip, Chompers can help make brushing fun. I have talked to a few parents who have used it – and their kids that didn’t want to brush definitely changed their tune after trying Chompers.
Here’s a comment from Gimlet Media’s “Head of Product” about how they got the idea:
The original idea actually came out of a hackathon that P&G did, and we loved it so much we decided to work on it together. One of the most tricky parts was keeping it simple. With Alexa skills, there is no app icon, you can’t scroll through a list of things, and there are no notifications. So we had to keep it extremely simple. The one element we did add (besides the content itself) was a “streaks” functionality to support the toothbrushing habit.
This video captures a 40-minute presentation by Ha-Hoa Hamano – Senior Product Manager, Voice Platforms of NPR – at the recent “Lingo Fest” conference. It was interesting to hear Ha-Hoa explain how NPR develops new voice experiences by following this 4-step process:
Not only is NPR developing 5-minute flash briefings – one for each rush hour – it’s leveraging other ways to use voice. NPR recognizes the reinvention of radio includes on-demand audio (eg. podcasts); curated content and near-live experiences. It also recognizes that the voice experience is ubiquitous. It’s worth watching the video to understand how proper analysis is important when deciding how to best use voice.
Here’s a webpage where NPR explains all the different ways you can hear NPR content on voice platforms…
As noted in this Wired article – entitled “Does Your Doctor Need a Voice Assistant?” – voice assistants can help doctors dramatically cut down the amount of time they spend writing notes about their appointments. Here’s an excerpt:
It’s a problem that started when doctors switched from handwritten records to electronic ones. Health care organizations have tried more manual fixes—human scribes either in the exam room or outsourced to Asia and dictation tools that can only convert text verbatim. But these new assistants—you’ll meet Suki in a sec—go one step further. Equipped with advanced artificial intelligence and natural language processing algorithms, all a doc has to do is ask them to listen. From there they’ll parse the conversation, structure it into medical and billing lingo, and insert it cleanly into an EHR.
I remember all too clearly how the first batch of websites in the late ’90s tended to be “brochureware.” Since the Internet was so new, companies didn’t know better and basically converted their printed marketing material into a website. Not very useful.
So what makes a good Alexa skill or Google action? Since its audio, the natural reaction is for folks to first thing of uploading long audio files onto the “voice cloud.” But that is merely a podcast or song – something that we can easily access in a non-voice world. True, accessing those things via a digital assistant is nice – you don’t need to click any buttons, etc. – but it doesn’t really move the needle in leveraging voice’s potential.
Voice’s potential is the interactive nature of it. The first things that come to mind are sets of FAQs – as your customers can ask questions to find out a bunch of basic information. FAQs work well because of what they are by definition – something that is frequently asked. And if the answers are short, all the better. Studies show – or if they don’t exist yet, they will show – that people want their answers short.
So when building your first skills, ask yourself “what questions are people asking us every day.” Then build a skill to handle those requests first. We are in the early innings of voice – and as we learn more what is desirable (and feasible), skills will become better designed to match what there truly is a need for.
In their “Pragmatic Talk” podcast (episode 8), Susan & Scot Westwater mention these tasks as the “low-hanging fruit” for voice:
1. Customer support for simple things (eg. a “hotline” of sorts)
2. Answers to easy questions (eg. what are our hours)
3. Sign up or purchase something
4. Explanation of how to use your product or service (eg. how to install something)
As the “wellness” industry grows along with an aging “Baby Boomer” population, the popularity of mobile apps to assist those looking for mindfulness also grows. A good example is the “Headspace” app – which includes bite-sized guided meditations and hundreds of themed sessions for stress, anxiety, sleep, etc.
It’s only natural that a bevy of skills is also available for those that seek wellness. Here are ten examples:
1. Sleep and Relaxation Sounds – With over 11,000 reviews – most of them five-star – this one is popular. You can pick from a list of 125 sounds and then let it loop until you say stop – or until a specific length of time that you tell it in advance. “Alexa open Sleep Sounds.”
2. Healing Sounds – Popular skill that plays relaxing sounds. You can select your sound or just listen to the one offered. In-skill purchasing available to buy additional sounds. “Alexa open Healing Sounds.”
3. Relaxing Sounds: Indian Flute – Great musical accompaniment for meditation, yoga, healing and complete relaxation. Of course, if you know the name of a musician who plays the Indian flute, you can just play their music directly. But if you don’t, this skill is for you. “Alexa, Open Indian Flute.”
5. The Daily Task – This is a skill to helps you to change step-by-step. I like the idea of this one – but it uses a synthetic voice. “Alexa, open The Daily Task.”
6. Guided Meditation: Meditation of the Day for Calm – Daily meditations between 3-8 minutes long, with a total of 80 or so in total. Tells you upfront how long a particular meditation will be. Uses a human voice. “Alexa, open Guided Meditation.”
7. Headspace – You need a Headspace account to access this skill. Provides new daily meditations and all the other type of stuff like the app mentioned above. “Alexa, open Headspace.”
8. Fitbit – You need a Fitbit account to access this skill. Helps you keep track of how you’re meeting your fitness goals – whether it be number of steps walked, number of hours slept, etc. “Alexa, ask Fitbit how I’m doing today.”
10. Happy Days – Random positive quotes. They’re short – so the synthetic voice might be palatable for some. “Alexa, open Happy Days.”
My pet peeve is that this is one of those areas that a synthetic voice is not a good match. It’s hard to get relaxed when listening to a Polly voice. So the best of these skills uses sounds & music – or human voices…
If you say “Alexa, no soup for you,” she’ll make you giggle with a reply of “come back one year, next” (shout-out to “Seinfeld” fans). But if I do come back in one year, how far will voice have progressed? This keynote presentation from voicebot.ai’s Bret Kinsella at the “Alexa Conference” in January is chock-full of stats – and some harbingers of things to come.
On page 34, Bret explains how the “reactive assistant” will evolve from being “context free” to “context first,” meaning that a response from a voice assistant will take into consideration the context of the discussion. If you’re sad, the assistant will be sad. If you’re ecstatic, the assistant will want to party. That’s pretty evolved stuff!
Bret also talks about the emergence of a “proactive assistant.” This means the your voice assistant will look out for your interests more and bring things to your attention that you otherwise might miss. For example, maybe your assistant will run a search for flights to compare prices against a flight you bought recently – and if it finds a cheaper flight, you might be able to get a refund to match that lower rate. Bret describes this as bots evolving from “gofers” to “agents.”