A layperson’s exploration of all things voice

Category Archives: Educational Use

April 13, 2021

Facebook Developing Virtual Assistants to Give the News

Here’s the intro from this “voicebot.ai” article:

Facebook is building a virtual assistant to digest, summarize, and read articles for users, according to a Buzzfeed report on a closed company meeting. The TL;DR tool, internet slang for “too long, didn’t read,” would use AI to condense articles into bullet points and read them out loud for users who want to skip reading it for themselves. The social media giant also discussed other planned AI projects, including creating a neural sensor to detect thoughts as they form and turn them into commands to AI assistants.

March 17, 2021

Twitter Expands It’s Social Audio App – “Spaces” – to Android

Here’s the intro from this voicebot.ai article: “Twitter has expanded the beta for its Spaces social audio platform to Android devices. The social media giant had previously limited Spaces to iOS devices, but people using Android can now apply to try out Spaces as Twitter pushes to refine the platform for wide release.”

Meanwhile, here’s a voicebot.ai podcast about the experiences of some experts with Clubhouse – the social audio app that I blogged about recently…

March 2, 2021

Social Audio Is Here! My Vid-Guide: “How Clubhouse Works (& Whether It’s Worth It)”

With voicebot.ai reporting that Clubhouse has surpassed 10 million members – I am among them – I put together this 12-minute video explaining how Clubhouse works and my ten cents about whether you should try it. With a few bonus tips if you do indeed give it a “go”…

January 14, 2021

Amazon Alexa’s “Natural Turn Taking” is Nice!

Here’s a piece from voicebot.ai about how Amazon Alexa has a new functionality that allows for “taking turns” and preferences – see this excerpt:

There is turn taking today when conversing with Alexa. The user speaks then Alexa speaks. That is followed by the user and back to Alexa and so forth. It’s highly structured and doesn’t accommodate interruptions, tangents, or trackbacks very well. The current model is decidedly unlike how humans interact in conversation. Natural turn taking is definitely more accommodating to the vagaries of human conversation.

As good as the natural turn taking demo was, the feature that will probably have a bigger impact is the ability to teach Alexa your preferences. This is long overdue. For Alexa to be a truly personal assistant, it needs to know personal preferences. This knowledge can help make Alexa more useful every day. Prasad demonstrated this feature as well telling Alexa what he meant by certain phrases. However, the practical benefits of Alexa remembering your preferences are easily overshadowed in a two-minute demo by the scope of changes required to support natural turn taking.

December 15, 2020

‘Voice-Activated’ Museum Opens in DC

Since it’s here in my backyard, I’ve got to blog about it. Here’s an excerpt from this article by Voicebot.ai:

Planet Word combines stories with technology in ten learning galleries. An interactive conversation with a wall of words relates the history and development of English, using what visitors say to pick out words to spotlight with the embedded lights. Technology is also crucial to an exhibit with smart paintbrushes for drawing words. Visitors can also practice virtual conversations with speakers of rare languages.

On the performative end, visitors can show off their own speech-giving talents in a soundproof room with a teleprompter that plays eight famous speeches or in a poetry nook in the library, as well as visit a karaoke area for learning about songwriting and performing their favorites. Outside, artist Rafael Lozano-Hemmer installed a metallic weeping willow that continually plays 364 voices in almost as many languages. The museum is inside the Franklin School, a very appropriate choice as it was there that Alexander Graham Bell told Watson he needed him in the first-ever wireless voice transmission.

August 12, 2020

A Synthetic Voice That Feels More “Human”

This “voicebot.ai” article is fascinating. It talks about a new synthetic voice called ‘Cerence Reader’ that is based on neural text-to-speech (TTS) and designed to read news to commuters.

This synthetic voice is unlike others – it has pausing, breathing and inflection – so it seem more humanlike than other synthetically generated voices.The article notes a preference for human voices becomes more intense when the passages of content are longer. So this could be a big development as noted in the article with this thought: “Having a synthetic speech option that sounds more humanlike opens up audio access to far more content than is available today and it can be available in real-time.”

February 27, 2020

Practice Pointers for Alexa Flash Briefings

Hat tip to Witlingo’s Ahmed Bouzid for turning me onto a fantastic month-long webinar series about Alexa Flash Briefings from Vixen Labs’ Suze Cooper and BBC’s Peter Stewart. They have been posting a new video each day during the month of February explaining how to best produce a Flash Briefing. The topics they cover range widely and include:

– How to come up with your show name

– How long should your episode titles be

– Don’t bother with show notes

– How to stake your claim in a crowded audio field

– Why it matters that Google & Spotify have joined the personalized audio content revolution

February 3, 2020

Developing Better Voice Experiences for Kids

In this podcast, Voicebot.ai’s Bret Kinsella talks with Patricia Scanlon, CEO of Soapbox Labs, the leader in automated speech recognition for kids. Here are some of the points made:

1. Existing voice solutions don’t work so well for kids because they are different both physically (eg. vocal cords) and behavorially (eg. slower or faster speaking speeds) than adults.

2. Voice studies tend to note that we are at 95% accuracy today – but that’s not quite accurate because that might happen only if perfect circumstances exists. Sort of like a good Boolean search on the Web – if you put in the perfect search terms, you are more likely to get a better result. With voice, you might get 95% if the circumstances are perfect (egs. crisp speaker, lack of ambient noise in the background). Bret noted how his research shows that the #1 desire of consumers for voice is to be understood better.

3. Patricia has spent more than six years collecting voice samples from kids. She explains how you need a balanced phonetics dataset to really move the needle for kids, collecting samples from different styles of speech and collecting from a large number of kids, not having kids say the same 100 words and using both a controlled & uncontrolled environment. Not just what they’re saying but how they’re saying it. Collecting samples from children with different accents and languages. Many adult datasets of speech already existed – but only a few decades-old datasets of kids were around before Patricia started her collection (those old ones were conducted as part of grant-funded controlled academic exercises, so they had limited utility).

4. Soapbox Labs is in (or will be) market for EdTech, language learning, smart toys and gaming. For example, screening preliterate kids for learning difficulties. Screening for fluency levels. A teacher doesn’t have the time to listen to all their kids reading more than sporidically. With voice, this “one-on-one” evaluation can happen more often as voice can help spot & correct errors and prompt encouragement. And do this cost-effectively on a large scale.

5. Near the end of the podcast, there is a good conversation about possibly moving kid interactions with voice devices “to the edge,” meaning off the cloud (and thus being more private). And whether – and when – this could be done cost-effectively (and how much privacy really matters in this context). Another item discussed is to bifurcate the types of answers given so that they are age appropriate.

6. At the end, there also is a discussion about what the definition of a “conversation” is – right now, voice interactions are instructional & transactional as we’re only scratching the surface of natural language understanding.