Abstract
Traditional interaction models for wearable displays often present challenges such as physical fatigue, social friction, and unintended activations. Fully autonomous execution based on inferred intent may also lead to high-stakes errors. This disclosure describes a method for predictive intent proposal and non-verbal confirmation using multimodal data analysis. Video streams, eye-vector data, and inertial measurement unit data are ingested. Raw eye movements are converted into semantic tokens. An on-device multimodal large language model processes these tokens and the visual scene to infer a likely user intent. A non-intrusive augmented reality overlay is generated to propose a specific action. The action is executed upon detection of a subtle head gesture, such as a pitch change, or dismissed via a yaw change or gaze shift. This process provides a hands-free, discreet mechanism for maintaining human authority over predicted digital or physical device actions.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Labzovsky, Ilia and Yakar, Tamar, "Predictive Intent Proposal and Gestural Confirmation in Head-Worn Displays via Multimodal Large Language Model Analysis", Technical Disclosure Commons, ()
https://www.tdcommons.org/dpubs_series/10427