Hotword-less virtual assistant interaction, referred to as look-to-talk, can be triggered when the user is within a certain distance from a device that provides the virtual assistant and looks at the device. With user permission, machine learning (ML) models analyze real-time video, audio, and text to determine if a given user utterance is intended as an instruction to the virtual assistant or an utterance directed to someone else in the room. This disclosure describes look-to-talk functionality that is applicable to not only stationary devices, but also mobile devices such as smartphones or tablets. The techniques detect changes in device orientation or mode to trigger look-to-talk functionality. In recognition of the fact that sensor data captured by a stationary device can be different that that captured by a mobile device, look-to-talk parameters and ML models are adapted to the device orientation and mode.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.