For users that use voice as their primary mode of input, operating a computing device can be difficult due to potential false positives (e.g., unintended voice commands by the user, background noise such as a radio). Voice commands can also be difficult to decipher, resulting in the voice-based accessibility service needing additional, clarifying user input to disambiguate the auditory commands.
This publication describes techniques and procedures for utilizing gaze detection to enhance voice-based accessibility services on a computing device, such as a smartphone or computer. The computing device utilizes camera image input and a machine-learned model to produce an estimated x-y coordinate of where the user is gazing on a display of the computing device. Utilizing the machine-learned model, if the computing device determines that the user is looking at the computing device’s display (i.e., giving the device attention), then auditory commands are accepted; otherwise, if the user is not giving the device attention, then auditory commands can be ignored. Additionally, the techniques and procedures can assist in disambiguation (e.g., similar sounding commands, identically titled functions). Finally, the techniques and procedures can be used as an alternative means for controlling the scrolling of the display of the device.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Crain, Matthew; Xu, Pingmei; Salo, Alex; and Shekel, Tomer, "Utilizing Gaze Detection to Enhance Voice-Based Accessibility Services", Technical Disclosure Commons, (June 18, 2019)