Processing a user’s voice command includes parsing the command to derive the components referred to therein. These identified components or arguments are then mapped to items or objects in the real world in a process known as “grounding.” In some cases, transcription inaccuracies can make it infeasible for a virtual assistant or other application to achieve accurate grounding, thus making it impossible to service the user’s command. This disclosure describes techniques to improve grounding by taking into account the top N highest-likelihood transcriptions for a user’s voice command along with contextual information accessed with the user’s permission. Improved query interpretation can enable a virtual assistant or other application to accurately interpret the command and thereby improve user experience.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Faruqui, Manaal; Verma, Vishal; and Gupta, Aditya, "Improved Contextual Grounding by Combining Multiple Speech Transcription Hypotheses", Technical Disclosure Commons, (February 19, 2021)