Automatic speech recognition (ASR) models are used to recognize voice commands or queries from users in hardware products such as smartphones, smart speakers/displays, as well as applications that enable speech interaction, e.g., virtual assistant applications. However, the query abandonment rates for voice queries continue to be much higher than text queries which is often due to incorrect interpretation of the spoken query. This disclosure describes techniques to improve the performance of recognition of spoken queries by combining user specific phonetic variations and session specific contextual signals, obtained with specific user permission.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Sodhi, Sukhdeep; Kumar, Ankit; Singh, Sarvjeet; Khan, Tameen; Apte, Ajit; and Jeje, Ayooluwakunmi, "Improving Automatic Speech Recognition by Co-embedding Voice Queries and Voice Query Refinements", Technical Disclosure Commons, (October 12, 2020)