There may be failures or delays in query fulfillment when a digital assistant utilizes server-side query processing to interpret a query and provide an appropriate response. The delay can be worse in certain situations, e.g., a poor network connection when a user places queries while in a moving vehicle. This disclosure leverages the observation that many queries are repeated verbatim by users. The described techniques, with user permission, utilize an on-device query cache to reduce latency in query fulfillment. With user permission, spoken commands are matched with prior queries stored in the on-device cache and if a match is found, the query is fulfilled locally. In the absence of a match, server-side query processing is performed, and the query is added to the client-device cache. The cache is updated based on user-permitted factors such as query recency, query frequency, etc.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.