Virtual assistants implemented on a client device can recognize and fulfill spoken queries locally on device, e.g., using trained machine learning models. However, on-device query processing requires substantial processing power and memory. Heavy use of processor and/or memory resources can slow down the operation of the entire device, including queries for which processing is offloaded to an external server. This disclosure describes techniques to enhance the performance of local processing of voice based user queries via a two-stage serial pipeline of trained machine learning models. A first stage includes a limited-scope high efficiency model that can only recognize and fulfill the user’s most frequent queries while a second stage includes the regular full-scope local query processing model that can recognize and fulfill a full range of queries.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.