In this podcast about privacy & security, Voicebot.ai’s Bret Kinsella talks with Rani Molla, the lead data reporter for Recode, Todd Mozer, CEO of Sensory, and Maarten Lens-FitzGerald of Open Voice. Here are some of the points made:
1. Bret noted that the biggest story of 2019 was that contractors for many of the big voice players were listening through smart speakers, etc. Stories broke throughout the year as one company after another was reported to be allowing humans to listen in on our voice commands.
2. Rani discusses how this was bad form by voice companies because – even though only a small slice of voice interactions were listened to – the general public had no idea that this was happening, particularly because the privacy policies for these companies were vague as to what they could (or would) be doing in this area. Contractors were listening primarily for quality control purposes (eg. Alexa incorrectly acted on a voice command and by listening in, the contractors could try to figure out a fix).
3. Todd stated that industry insiders knew this kind of thing was happening. He notes that some voice vendors – like his company – are working to provide voice “on the edge,” meaning they aren’t on the cloud. These “self-contained” voice solutions not only can provide cheaper access to voice (because they use less energy), they also can provide better security.
4. Martin gave the European perspective – today, big tech companies are not trusted in Europe at all. He compared the early days of the Internet – when it seemed like a Utopia – to now when there is little trust online.
Other points made included:
– Treat privacy as an emotional issue for people. That’s why surprising them by not being upfront about what you’re doing is a real reputation killer.
– To some extent, privacy expectations vary among generations. Younger people have grown up with social media and expect less.
– Government and hackers already have your data. So if some companies have it too, does it matter anymore? Is there much you can do about it?
– Deep face voice is coming. What does two-step authentication for voice look like? What other security measures can be taken?
According to this Edison Research report from last year, among people who do not own a smart speaker and are interested in acquiring one, the top two reasons they have not are:
– 63% are concerned that hackers could use a smart speaker to gain access to their home or personal information
– 55% are bothered that smart speakers are always listening
Interestingly, privacy is also a concern among those that own a smart speaker – at nearly the same rate as those that don’t own one…
When I talk to people about the potential of voice, a common reaction is that they aren’t interested. Note that most of these people are around my age – over 50. Even when I explain the myriad of ways that voice can make life better for so many, I get shrugs. One of the biggest concerns is privacy. That’s why this note from the “RAIN” agency is welcome:
In response to multiple press reports and challenges from regulatory bodies, the tech industry has taken steps to reduce human oversight of voice recordings and provide users with more control. For now, Google and Apple have ceased human reviews of voice data transmitted through Assistant and Siri respectively. While Amazon has not halted their processes, they have created an option in the Alexa Privacy Settings for users to opt-out of their data being reviewed in this way. It seems the latest wave of pressure around data management has been heard by these companies loud and clear.
This TechCrunch article explains more about what Amazon recently did…
I often get asked about the regulatory state of affairs for voice since I’m a lawyer. This blurb from a recent Gibson Dunn memo notes one development:
On May 29, the California State Assembly passed a bill (A.B. 1395) requiring manufacturers of ambient listening devices like smart speakers to receive consent from users before retaining voice recordings, and banning manufacturers from sharing command recordings with third parties. The bill is currently being considered by the State Senate. Companies that manufacture smart devices which record commands by default and which use the data to train their automated systems should pay close attention to developments in this space.
A few years ago, Amazon launched a “verified parental consent” feature to enable skills to comply with child data protection laws (eg. “Children’s Online Privacy Protection Act”). The FTC updated its guidelines in mid-2017 to clarify that online services include “voice-over internet protocol services,” so businesses do indeed need to obtain permission to store a child’s voice.
Here’s how that works: the first time that you enable a “kid skill,” Alexa will prompt you to provide parental permission via the Alexa app. The verification process requires parents/ to either enter a one-time password sent via text to their phone or perform verification by a credit card.
Since parental consent will then apply to all kid skills, parents only have to complete this process once. So if a parent enables any kid skill, they’ll have effectively enabled consent for their kids to use Alexa as much as they want.
So if you’re planning on building a skill for kids, you won’t be collecting parental consents yourself – you’ll be relying on this process that Amazon has instituted. Of course, you should still be mindful in building a skill designed for that kids that you don’t do something unethical, etc. that could hurt your reputation.
For some, voice assistants scare them due to privacy concerns. I get that. But in my opinion, we already have lost the war when it comes to privacy. Even if you are paranoid enough to go “off-the-grid,” we still know where you are. And let’s face it, going off-the-grid kinda takes the fun out of life. But maybe there is hope after all – this “voicebot.ai” podcast with Vijay Balasubramaniyan, the founder & CEO of Pindrop Security blew my mind.
Vijay explains how audio is so rich that you can determine the source of a phone call from just the “audio characteristics.” These audio characteristics include loss (ie. just a few milliseconds of breaks in speech), noise (ie. delay in providing the speech & the background sounds) & frequency . Pindrop Security’s core technologies use:
1. Deep voice – who you are based on your voice
2. Phone printing – what you have based on the device that you use to speak
3. Behavior printing – what you do based on your behavior
Why should you care? Because identify theft is rampant. And because your providers pepper you with an increasingly set of complex questions to confirm who you are when you contact them. What if both of these issues could be solved? That would be nice, right?
Pindrop Security sells their unique services to your providers – banks, insurance & credit card companies – so that they can authenticate that you are who you say you are based on just the first few sentences you utter when you place a call to them. This Forbes article describes it pretty well. Here’s an excerpt:
The release of Pindrop Passport, a new authentication tool that the startup unveiled on Thursday, is a big first step. Passport scans a caller’s voice, behavior (how they press their phone to input a pin) and the phone’s signature (whether it’s rerouted or from the wrong geography) to come up with risk scores in the time it takes to speak a sentence, the company says. That quick verification process not only reduces fraud but can shave as much as 55 seconds off the average call – equivalent to $1 saved in employee time, and potentially millions annually.
And while technology exists to artificially reproduce someone’s voice, Pindrop Security’s system can distinguish when a voice has been synthetically stitched together due to limits in our ability to simulate a person’s exact vocal chords…