Automatic speech recognition (ASR) models are used to recognize user commands or queries in products such as smartphones, smart speakers/displays, and other products that enable speech interaction. Automatic speech recognition is a complex problem that requires correct processing of the acoustic and semantic signals from the voice input. Natural language understanding (NLU) systems sometimes fail to correctly interpret utterances that are associated with multiple possible intents. Per techniques described herein, device context features such as the identity of the foreground application and other information is utilized to disambiguate intent for a voice query. Incorporating device context as input to NLU models leads to improvement in the ability of the NLU models to correctly interpret utterances with ambiguous intent.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.