A layperson’s exploration of all things voice

May 1, 2019

How to Improve the “Discoverability” of Voice

At this week’s “SpeechTEK Conference,” Bruce Balentine of Enterprise Integration Group gave an excellent presentation on “discoverability” for voice. This interview gives a sense of what Bruce talked about:

Q: Why is it difficult for users to discover functions and operations that they can perform using voice applications?

A: Users discover functions and operations in a GUI interface by freely exploring, because a GUI utilizes the sense of sight and exists within the three dimensions of space. This is less effective in a VUI, because a VUI utilizes the sense of hearing and exists within the single dimension of time. Users therefore easily become lost, and the passage of time extracts a higher penalty in terms of thinking, confusion, inability to return to known starting places, loss of context, and risk of sudden dialogue terminations.

Q: How can users apply what they know about current voice applications when using new voice applications?

A: Users generally cannot apply what they know about current voice applications when using new voice applications—a phenomenon known as transfer of learning. This is partly because of a lack of standards, which product designers eschew in favor of differentiation for the sake of “branding.”

It is also because the industry has ranked very-large-vocabulary freeform “natural language” over such ergonomic issues as error-detection and recovery, fixed and learnable methods for backing up or skipping forward, consistent turn-taking rules, and user-machine-environment modeling for situated awareness—all user interface subsets that lend themselves to standardization.

Q: Are frequently asked questions, user guides, and YouTube videos enough?

A: FAQs, user guides, and YouTube videos are not enough. External collateral and observation do have their place, but the most effective discovery technique is user exploration. This method of user learning is dissonant with today’s variant, opaque, and ill-considered surface designs, which unknowingly send misleading and inconsistent cues that prevent users from forming an effective theory of the machine’s mind.

Q: What are the big takeaways from your SpeechTEK presentation?

A: The big takeaways from my presentation include a better understanding of the importance of timing, eye-opening detail about grounding junctures, the importance of user-initiated backup, and an interesting and subtle heuristic-development theory for empathic learning—all features that contribute directly to discoverability in voice applications of all kinds.