Virtual assistants, as provided via devices such as smart displays or smart speakers utilize automatic speech recognition (ASR) techniques to interpret spoken commands. ASR is performed by the use of a language model that is trained using a training corpus that includes spoken variations of the commonly expected commands. However, such training does not account for words or phrases that are domain specific, e.g., names of contacts (used for calling), names of media items or artists (used for media playback), etc. Virtual assistants can therefore experience a high rate of failure to interpret commands. This disclosure describes techniques to automatically switch to a class-based language model (CLM) when specific command domains are detected in spoken queries. The CLM utilizes available user data, e.g., list of contacts, to constrain interpretation of spoken commands and can therefore achieve a high level of accuracy without the need for prior training. The use of CLM for query interpretation enables the virtual assistant to provide accurate responses.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.