That is contrary to what I have heard others say – as voice appears a logical place for people to get quick answers to simple questions. The Westwaters explain that voice isn’t just about Q&A – and that you don’t really know whether the FAQs that you’ve created are indeed what people ask about your stuff.
The Westwaters explain that you need to do real research to find out the way people will ask questions in a voice format – that they tend to ask in longer phrases when talking. For example, there might be 200 ways to say “turn up the volume.” So you’re going to need to spend some time developing even a simple FAQ skill because you’ll need to account for all the different ways that people ask for things.
Given that time to do your homework is essential, it’s good to start small – with voice, it’s best to do one or two things very well rather than a myriad of things sufficiently. When considering where you want to start with a skill, think about whether you want to start with customer service, acquiring new potential customers or retaining customers. It’s probably best to design a skill that focuses on just one of three alternatives rather than accomplish all three.
Then after starting small, you can launch and actually observe how folks are interacting with it – so you can further refine it to best match what people seem to want out of it. This is a whole lot of sage advice coming out of just a 3-minute podcast by the Westwaters!
My friend has a 5-year old that apparently is wild about the “Freeze Dance Game” skill. So I went to enable the skill on my Amazon Alexa app – and I found that when you first try to enable a skill for kids, that you need to provide credit card information to prove that an adult approved a kid thing. You don’t get charged anything. It’s just a vehicle to ensure the parents know what their kid is doing.
Which is all kind of funny to me because it’s a harmless game. So this is not kids wracking up huge bills with in-game purchases – if anything, it’s the opposite. It makes you feel uneasy because you just gave Amazon your credit card information. Anyway, I digress…the game is awesome as you can tell by this video:
I babysat my 9-year old nephew over the weekend and was thinking of giving the popular game – “Yes Sire” – a whirl. But then I learned the game was rated “mature” – so that wasn’t happening. There is an easy way to set up “parental controls” via your Alexa app by the way. So we decided to try the game with a few friends. “Alexa, play Yes Sire.”
I can see why the game is so popular. It was fun – and easy to play. Strategy was involved. You sit as a medieval lord of the realm, presented with an expanding array of choices that become more difficult as you go. Make good choices and stay in power. Playing it a second time, the skill asks if you want to make an in-skill purchase – something that has worked well for “Volley,” the company who created “Yes Sire.”
As Susan Westwater discusses in her “Pragmatic Talk” podcast (episode 4), when building your skill, it’s critical that you plan for conversational repair. This means that planning for the unintended is as important as planning for how you think your users will interact your skill.
That’s because it’s impossible to predict all the different ways that users will interact with your skill. It’s inevitable they will act in ways you don’t expect. They will go off-roading.
So you need to figure out how to bring your users back to the conversation without frustrating them with responses of “I don’t understand.” They don’t want a circular experience where they don’t move forward. You will need to give them periodic confirmation as part of your skill’s responses – but you also need to keep moving the ball forward for them.
We are in early days so users today understand the science isn’t perfect – but they will pretty quickly tire of not getting to where they want to go.
What if you could combine storytelling in a real book with the benefits that voice offers? That’s starting to happen. For example, check out the intro from this article:
Melissa and Matt Hammersley got the idea for Novel Effect when they were expecting their first child, Eleanor. They wanted to create something that would help them bond as a family and use technology to bring a little more magic into her life. A light bulb went off at their baby shower, when a friend did a theatrical reading of a book that would soon become Eleanor’s. What if technology could simulate that experience and turn story time into an almost cinematic experience?
They brought on a team of experts and began building Novel Effect, which uses voice recognition technology to follow along when someone reads a book out loud, adding music, sound effects, and other features.
More recently, the NY Times has started providing invocations embedded in articles with so that you can learn more about the article’s topic through your voice assistant.
This isn’t even ‘new’ news. A few years back (as noted in this article – and a video), a robot – from China’s iFlytek – took China’s national medical licensing examination and passed. Not only did the robot pass the exam, it actually got a score of 456 points – which was 96 points above the passing threshold. This makes sense given that healthcare is one of the fields that has been taking AI seriously for some time. Robots actually aren’t meant to replace human doctors. They’re meant to just be assistants to help human doctors improve their efficiency.
There’s been a number of “Siri Challenges” over the years. As noted in this article, the latest one involves musicians with iPhones asking Siri “What is one trillion to the tenth power.” Asking Siri that particular question is not new, but what these musicians have done with it has taken the challenge to an entirely new “beatbox” level:
Love the intro in this video by Amazon’s Paul Cutsinger (4:20 mark) explaining how voice is different than prior technological leaps because for the first time, the technology has to learn about us, rather than the other way around. It has to learn how we speak. Paul cautions about using best practices from Web & mobile design and try to make it work for voice. He also explains how we need to utilize design sensibilities for voice – something that he calls “situational design.” The device has to be aware of the situation so that it can react to the customer. A design tailored for conversation.
Here’s some great pointers from Paul’s presentation:
1. Be Adaptable – Let speakers use their own words. Instead of trying to force the customer into choosing between a few buttons (like you might on a Web or mobile app), the customer chooses what they want. Your job is to anticipate what they might choose – and then train the natural language device to handle that. So need to think of synonyms, utterances, intents, slots, etc.
2. Be Contextual – Individualize your entire interaction. For example, people don’t use only one greeting. For web & mobile, you essentially train your users how to use what you built & they use it that way consistently. It’s the exact opposite with voice – people don’t want to use the same thing over & over again. People are very good at skimming – in voice, there is no skimming.
Consistency helps you with visual; hurts you with voice. One way to handle this is to be random. But need to be careful with that – device might not understand the context. Needs to have memory to understand what happened before. What will the 10th time in the experience feel like to your customers?
3. Be Available – Keep your interactions top level. In Web & mobile, you need to have hierarchical design. Nested menus. You can’t have everything at the top level. With voice, you need a wide top level. It doesn’t overwhelm the user because they can’t see it. You can’t use nested menus for voice because the user can’t see it (and it’s unfair to expect them to memorize it).
As soon as you find yourself putting together a flow chart when putting your skill together, you’re in trouble. Because the user experience is flipped on its head with voice – it’s your voice assistant responding to you, not the other way around like it is with Web & mobile. How do you deal with that?
You have a core utterance from user & then there is a situation. A “welcome” and then a prompt to have the user do something. If the user comes back a second time, the situation is different – and the user expects the voice assistant to know that.