A layperson’s exploration of all things voice

Category Archives: Quality of Voice

April 20, 2021

Analyzing Voice Sentiment With Spotify’s Coming Tech

Here is a note from the “Rain” agency:

Voice is a natural channel for conveying our emotions and feelings. However, voice technology is still trying to crack analyzing sentiment and how these data points can inform the creation of emotionally intelligent voice experiences and assistants. We have seen voice assistants like Amazon Alexa and Google Assistant expand their speaking styles to include different emotions in certain responses, reflecting more humanlike interactions.

However, monitoring emotions on the consumer side is still a nascent technology. Amazon has made some steps toward realizing this with its health and wellness wearable Halo, which tracks users’ tone through their voices to make them more aware of their communication styles. This week, we’ve seen a new update in emotion recognition with Spotify’s patent approval of technology that analyzes listeners’ moods. Even though the patent only points to a small amount of features and a targeted use case, we are beginning to see how voice technology might leverage sentiment to provide more relevant recommendations and experiences for consumers in many contexts.

January 29, 2021

Wow! Alexa Now Can Infer Your Intent

As part of Amazon’s long-term goal to make talking to Alexa more natural, a new “infer your intent” functionality has been built. Here’s an excerpt from this article from the “Verge”:

Finding new ways to use Amazon’s Alexa has always been a bit of a pain. Amazon boasts that its AI assistant has more than 100,000 skills, but most are garbage and the useful ones are far from easy to discover. Today, though, Amazon announced it’s launched a new way to surface skills: by guessing what users are after when they talk to Alexa about other tasks.

The company refers to this process as “[inferring] customers’ latent goals.” By this, it means working out any questions that are implied by other queries. Amazon gives the example of a customer asking “How long does it take to steep tea?” to which Alexa will answer “five minutes” before asking the follow-up: ”Would you like me to set a timer for five minutes?”

August 12, 2020

A Synthetic Voice That Feels More “Human”

This “voicebot.ai” article is fascinating. It talks about a new synthetic voice called ‘Cerence Reader’ that is based on neural text-to-speech (TTS) and designed to read news to commuters.

This synthetic voice is unlike others – it has pausing, breathing and inflection – so it seem more humanlike than other synthetically generated voices.The article notes a preference for human voices becomes more intense when the passages of content are longer. So this could be a big development as noted in the article with this thought: “Having a synthetic speech option that sounds more humanlike opens up audio access to far more content than is available today and it can be available in real-time.”

May 28, 2020

Customizing Google’s Devices to Determine Their Sensitivity to Sounds

As noted in this “Verge” article, Google is starting to “roll out gradually” a feature allowing you to customize voice detection sensitivity on Google Assistant devices. Here’s the article:

Google is starting to “roll out gradually” a feature allowing you to customize voice detection sensitivity on Google Assistant devices, a spokesperson confirmed to The Verge. Although the feature has not been widely released yet, Mishaal Rahman, editor-in-chief of XDA Developers, was able to access the feature by tinkering with the Google Home app’s code, he told The Verge.

Screenshots that Rahman posted to Twitter show the “‘Hey Google’ Sensitivity” feature displaying a slider that allows you to increase or reduce the sensitivity with which Google Assistant devices pick up the command “Hey Google.” Last September, Google confirmed there was an update coming that would let you adjust listening sensitivity. The new setting is meant to decrease accidental activations of your Assistant.

May 5, 2020

Amazon Upgrades Alexa’s Long-Form Speech

This Voicebot.ai article is exciting. Amazon is expanding Alexa’s voices and speech style choices for voice app developers – their text-to-speech service – known as “Polly” has more than a dozen new voices and styles from which to choose. I imagine that will continue to grow.

Even more exciting is that Alexa will sound more natural when speaking for longer periods. Here’s an excerpt from the article:

A lot of interaction with Alexa involves short responses or rote lines. That starts to sound strange when the voice assistant speaks for more than a few seconds. Alexa’s new long-form speaking stye is designed to address that disconnect and make using Alexa feel as comfortable as talking to another human. Since people don’t speak the same way when uttering a sentence as they do when expounding for multiple paragraphs, the addition is likely to be popular with voice apps that read magazines, books, or transcribed conversations from a podcast out loud. For now, this style is only an option for Alexa in the United States.

“For example, you can use this speaking style for customers who want to have the content on a web page read to them or listen to a storytelling section in a game,” Alexa developer Catherine Gao explained in Amazon’s blog post about the new feature. “Powered by a deep-learning text-to-speech model, the long-form speaking style enables Alexa to speak with more natural pauses while going from one paragraph to the next or even from one dialog to another between different characters.”

March 19, 2020

The NFL’s Push Into Voice

In this podcast, Voicebot.ai’s Bret Kinsella talks with NFL Labs’ Ian Campbell and Bondad.fm’s John Gillilan about how the NFL has embraced voice. The topics included:

– Goal is engaging with fans on multiple channels since fan expectations are higher nowadays as many are tech savvy.
– The NFL’s partners also expect more and the benefit to the NFL is additional product integration opportunities.
– Started with a lot of small prototypes in voice. Started with an Alexa Skill (‘Rookie’s Guide to the NFL’) last offseason. The skill teaches new fans the rules, including an international audience (games now played in London and Mexico City). Most of their voice endeavors so far have been on Alexa, but they do have some content on the Google Assistant too.
– You need to rethink your content for a voice platform. Can’t write for voice in a vacuum, you need to hear how it sounds – so how you spell things matters as its part of your personality, what type of music behind the voice matters, etc. So it’s more than just scripting.
– Voice brings a lot of truths to your content. For example, for the ‘Rookie’s Guide” skill, they had to consider how to explain the jargon and commentary that accompanies the rules. There is a unique language & nomenclature exists for every industry.
– So far, the NFL has done four types of Flash Briefings: Definitions, News, Editorials, Quizzes, Games & Storytelling.
– Used both synthetic Polly voice (the one offered by Amazon called “Matthew”) and a real player (Maurice Jones-Drew) and a sportscaster (Cole Wright). They are looking at VocalID’s service too. They have tried proto personas to see what works – and if it works, they build on that.
– They tried an avatar of ‘Football Frank,’ which used the Polly voice of Matthew.
– They spend a lot of time trying to help fans get back on track if they make a request that “fails” – they do that with some humor to lessen the blow of a failure.
– They have a multimodal project that is just internal now. They use a ‘hear, see, do’ principle to try to adjust to the differences from voice-only to a screen addition.

January 21, 2020

Emotive & Experiential Sounds as Part of Your Audio Brand

This Voicebot.ai podcast provides ten short interviews from the CES conference. CES offered a “voice” track for the first time – and Voicebot’s Bret Kinsella noted that voice was expected to be integrated with technology this time around, a development from it just being a novelty.

At the 11:50 mark, Bret talks to Audiobrain’s Audrey Arbeeney. Audrey’s company provides assistance to those companies who are adding sounds as part of their branding – sort of the analogue to logos from a visual standpoint. It’s an art & science that goes beyond playing simple sounds to identify your brand. She notes she’s on a panel with someone at Whirlpool – and how Whirlpool uses different sounds in their washing machines that are emotive & experiential.

There are sonic branding guidelines to consider, which for some companies will be on a global basis – particularly because you want the brand to be consistent. Here are other examples of what Audiobrain has done for clients. Fascinating stuff!

December 2, 2019

We the People Prefer Human to Synthetic Voices

I’m so excited to see this first study about what consumers want out of voice design – a joint study conducted by voicebot.ai, Voice.com and Pulse Labs. I’ll be blogging about a few of the results – the first one being this excerpt:

It will surprise no one that our user panel expressed a preference for human voices over synthetic voices. Observers have long suspected that users preferred to hear humans. In our testing, human voices received an overall rating of 3.86 on a scale of 1.00-5.00 com-pared to 2.25 for synthetic voices generated by artificial intelligence. That difference re-flects a 71.6% higher rating for human voices over the synthetic alternative.

November 4, 2019

How to Create a Brand “Voice”

I’ve been blogging a lot about this free 48-page playbook by “360i” about what you should know about voice from a marketing perspective. Today I focus on its chapter about how to create a brand “voice” (starts on page 54). It’s about considering the tone of your brand’s voice on voice assistants – the use of voice “standards” and more. Here’s an excerpt:

A successfully consistent tone of voice employs specific tone standards that mimic the way you sound when you speak and are in line with your personality. A set of tone standards allows you to sound consistently like yourself, while giving you the flexibility to adapt to various situations. Take Oprah for example – maybe her tone standards are earnest, boisterous, and empathetic. These are the underlying spaces she plays in, dialing up and down different ones for different situations.

If she’s giving away a car, it’s in a boisterous voice with a touch of empathy. If she’s interviewing the first lady, it’s earnest and boisterous in light-hearted moments. Donating a school to an underprivileged community? She has an empathetic voice with a hint of an earnest tone for that too.