Abstract
ABSTRACT
A single-shot gaze vector often fails to accurately reflect user intent during multimodal queries due to saccadic noise, visual competition in dense environments, and the temporal lag between visual fixation and vocalization. This disclosure describes a method for multimodal query disambiguation using a temporal gaze buffer that maintains a rolling history of gaze vectors. After receiving a query, a probabilistic spatial-temporal heatmap can be generated from the buffered data. This heatmap represents attention density over a variable time window, which may be adjusted based on the query context or environmental classifications. To identify specific objects of interest, image frames are captured and synchronized with the gaze data. These images are used to map the gaze vectors onto the captured environment, ensuring that the visual focus is accurately associated with identified objects. Gaze data can be optionally transformed into object-relative coordinates to maintain focus on moving targets within the images. By providing a statistically significant representation of user attention rather than a discrete point, query resolution is made more robust against natural eye movements. The precision of intent recognition is improved, and the need for user clarification is minimized.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Hurwitz, John D. and Mahomed, Yaaseen, "Probabilistic User Intent Modeling based on Time-Window Gaze Analysis", Technical Disclosure Commons, (April 21, 2026)
https://www.tdcommons.org/dpubs_series/9869