A layperson’s exploration of all things voice

Monthly Archives: February 2020

February 3, 2020

Developing Better Voice Experiences for Kids

In this podcast, Voicebot.ai’s Bret Kinsella talks with Patricia Scanlon, CEO of Soapbox Labs, the leader in automated speech recognition for kids. Here are some of the points made:

1. Existing voice solutions don’t work so well for kids because they are different both physically (eg. vocal cords) and behavorially (eg. slower or faster speaking speeds) than adults.

2. Voice studies tend to note that we are at 95% accuracy today – but that’s not quite accurate because that might happen only if perfect circumstances exists. Sort of like a good Boolean search on the Web – if you put in the perfect search terms, you are more likely to get a better result. With voice, you might get 95% if the circumstances are perfect (egs. crisp speaker, lack of ambient noise in the background). Bret noted how his research shows that the #1 desire of consumers for voice is to be understood better.

3. Patricia has spent more than six years collecting voice samples from kids. She explains how you need a balanced phonetics dataset to really move the needle for kids, collecting samples from different styles of speech and collecting from a large number of kids, not having kids say the same 100 words and using both a controlled & uncontrolled environment. Not just what they’re saying but how they’re saying it. Collecting samples from children with different accents and languages. Many adult datasets of speech already existed – but only a few decades-old datasets of kids were around before Patricia started her collection (those old ones were conducted as part of grant-funded controlled academic exercises, so they had limited utility).

4. Soapbox Labs is in (or will be) market for EdTech, language learning, smart toys and gaming. For example, screening preliterate kids for learning difficulties. Screening for fluency levels. A teacher doesn’t have the time to listen to all their kids reading more than sporidically. With voice, this “one-on-one” evaluation can happen more often as voice can help spot & correct errors and prompt encouragement. And do this cost-effectively on a large scale.

5. Near the end of the podcast, there is a good conversation about possibly moving kid interactions with voice devices “to the edge,” meaning off the cloud (and thus being more private). And whether – and when – this could be done cost-effectively (and how much privacy really matters in this context). Another item discussed is to bifurcate the types of answers given so that they are age appropriate.

6. At the end, there also is a discussion about what the definition of a “conversation” is – right now, voice interactions are instructional & transactional as we’re only scratching the surface of natural language understanding.