A layperson’s exploration of all things voice

Category Archives: Multimodal

January 23, 2020

Voice on Browsers Shows the Push Towards Multimodal

This Voicebot.ai article describes how Mozilla has rolled out – in beta form – voice search for its Firefox browser called “Firefox Voice.” Here’s the article’s analysis of how it works right now:

Firefox Voice performs like a smart display voice assistant within the browser as an extension. Unlike a smart speaker, it doesn’t have a wake word, at least not yet, and is activated by clicking on the icon in the address bar. The tool is limited to the desktop version of the browser and only works in English. Once activated, the assistant will answer questions by using the default search engine or go to specific websites if it recognizes them. A brief test found that it will recognize big company names relatively quickly, but will do a search instead if it’s a more obscure website.

The more interesting aspect of the voice assistant is that it can manage the browser’s tabs and control media playback. For instance, it will start or stop a YouTube video and adjust the volume. Firefox Voice also understands requests for maps and translations and can even copy and paste text, although it can be slightly tricky to pick out which parts of a website to highlight for copying.

Combined use with Google’s announcement that it will soon begin using Google Assistant instead of existing voice tool in its Chrome browser means that the move towards multimodal (ie. using voice combined with screens) continue to grow…