A layperson’s exploration of all things voice

Category Archives: Educational Use

August 12, 2020

A Synthetic Voice That Feels More “Human”

This “voicebot.ai” article is fascinating. It talks about a new synthetic voice called ‘Cerence Reader’ that is based on neural text-to-speech (TTS) and designed to read news to commuters.

This synthetic voice is unlike others – it has pausing, breathing and inflection – so it seem more humanlike than other synthetically generated voices.The article notes a preference for human voices becomes more intense when the passages of content are longer. So this could be a big development as noted in the article with this thought: “Having a synthetic speech option that sounds more humanlike opens up audio access to far more content than is available today and it can be available in real-time.”

February 27, 2020

Practice Pointers for Alexa Flash Briefings

Hat tip to Witlingo’s Ahmed Bouzid for turning me onto a fantastic month-long webinar series about Alexa Flash Briefings from Vixen Labs’ Suze Cooper and BBC’s Peter Stewart. They have been posting a new video each day during the month of February explaining how to best produce a Flash Briefing. The topics they cover range widely and include:

– How to come up with your show name

– How long should your episode titles be

– Don’t bother with show notes

– How to stake your claim in a crowded audio field

– Why it matters that Google & Spotify have joined the personalized audio content revolution

February 3, 2020

Developing Better Voice Experiences for Kids

In this podcast, Voicebot.ai’s Bret Kinsella talks with Patricia Scanlon, CEO of Soapbox Labs, the leader in automated speech recognition for kids. Here are some of the points made:

1. Existing voice solutions don’t work so well for kids because they are different both physically (eg. vocal cords) and behavorially (eg. slower or faster speaking speeds) than adults.

2. Voice studies tend to note that we are at 95% accuracy today – but that’s not quite accurate because that might happen only if perfect circumstances exists. Sort of like a good Boolean search on the Web – if you put in the perfect search terms, you are more likely to get a better result. With voice, you might get 95% if the circumstances are perfect (egs. crisp speaker, lack of ambient noise in the background). Bret noted how his research shows that the #1 desire of consumers for voice is to be understood better.

3. Patricia has spent more than six years collecting voice samples from kids. She explains how you need a balanced phonetics dataset to really move the needle for kids, collecting samples from different styles of speech and collecting from a large number of kids, not having kids say the same 100 words and using both a controlled & uncontrolled environment. Not just what they’re saying but how they’re saying it. Collecting samples from children with different accents and languages. Many adult datasets of speech already existed – but only a few decades-old datasets of kids were around before Patricia started her collection (those old ones were conducted as part of grant-funded controlled academic exercises, so they had limited utility).

4. Soapbox Labs is in (or will be) market for EdTech, language learning, smart toys and gaming. For example, screening preliterate kids for learning difficulties. Screening for fluency levels. A teacher doesn’t have the time to listen to all their kids reading more than sporidically. With voice, this “one-on-one” evaluation can happen more often as voice can help spot & correct errors and prompt encouragement. And do this cost-effectively on a large scale.

5. Near the end of the podcast, there is a good conversation about possibly moving kid interactions with voice devices “to the edge,” meaning off the cloud (and thus being more private). And whether – and when – this could be done cost-effectively (and how much privacy really matters in this context). Another item discussed is to bifurcate the types of answers given so that they are age appropriate.

6. At the end, there also is a discussion about what the definition of a “conversation” is – right now, voice interactions are instructional & transactional as we’re only scratching the surface of natural language understanding.

November 21, 2019

How Skills Can Complement Educational Offerings

Here’s a note from the RAIN agency about this ‘voicebot.ai’ article:

Coursera has launched a new Alexa skill which will function as a companion to students who are utilizing their educational platform. According to Voicebot AI, “Once the skill is installed and linked, students can ask Alexa questions about ongoing assignments test scores, grades, and other information about their progress in a class.” As Amazon continues to build up their children and education offerings, the Coursera skill supports a valuable assistive use case. The skill begins to show how young students can bring Alexa into their daily routines and use the medium to help manage their schoolwork and provide on-demand information about academic performance.

Here’s an excerpt from the ‘voicebot.ai’ article:

The new skill is possible thanks to the newly created Alexa Education Skill API. While the API is still only in preview, Coursera and other companies in the digital education industry are starting to test many of its features. While Coursera’s is focused on students checking on assignments, education tech companies Blackboard and Canvas have launched skills for both students and parents to get updates on school assignments. Meanwhile, Kickboard and ParentSquare are using the new API as a way for schools to communicate with parents about the latest news and even behavioral reports on their children.