This Voicebot.ai podcast provides ten short interviews from the CES conference. At the 28:30 mark, Bret talks to WillowTree’s Tobias Dengel who discusses how voice’s big area of growth will be multimodal (typically meaning a combination of visual – think screen or text – and audio). Among Tobias’ comments were these:
– Most mobile apps will be voice activated in a year or two
– Distinguish speaking & listening: humans want to speak to machines but receive information visually, by screen, text, etc. Note that in Star Trek, the ship’s computer doesn’t talk back to someone who gives a command
– We want transactional convenience. At CES, there was a demo of someone ordering pizza. For starters, it’s better to order on an app than website because the functionality is better. With an app, it takes 45 seconds to place an order. With voice, it drops to less than 10 seconds. But if the voice app repeats back our order by audio, we lose benefit of that speed – but if instead it shows our order for confirmation by text or a screen, we can keep the transaction to 15 seconds. That’s a significant time savings.
– Another example at CES was discussed by the Mayo Clinic – right now, doctors take notes as they talk to you. It’s distracting and a waste of time. What if instead, your commentary to your doctor was recorded – relieving the doctor of the legal obligation to take notes.
– Another example is deciding to go to a movie – it’s much faster to review showtimes visually than hear them (who can forget Kramer in Seinfeld reading movie showtimes over the phone!), but then complete the purchase of tickets with a voice command.
– Bret notes that this is somewhat of a contrarian view, the view held by Tobias that we don’t need more immersive experiences – we need better transactions. Tobias notes that Google says this is year of screens paired with voice. The explosion of multimodal.
– Tobias opened with the psychological perception of voice. Right now, the problem with voice is people still don’t trust it – with the reason for the problem is that people are evaluating voice apps as humans and not as machines, and they need to change that view because voice apps aren’t humans.