The content displayed on a device screen is often indicative of a user task and, with user permission, can be analyzed to surface suggestions that can enable users to perform such tasks quickly. However, rules-based systems that rely on pre-stored knowledge about text and other content may fail to correctly recognize and surface the appropriate task and need to be modified for different user contexts. This disclosure describes the use of a large language model (LLM), e.g., an on-device model, to analyze on-screen content and its context to automatically determine user intent. The reasoning capabilities of the LLM enable reliable detection without need for intent-specific training. The LLM output can be post-processed to surface user interface elements that, upon user selection, enable the user to complete tasks associated with the intent. On-device text recognition and LLM enable intent detection in a reliable and privacy-preserving manner.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.