Abstract

Today’s smart devices often feel slow and clunky because they wait for long pauses in speech and ignore what happened just before you started talking. This paper introduces a better way for devices to predict what you want by looking at a mix of clues. The system saves a few seconds of audio before you speak and combines it with where you are looking in real-time. By comparing this info with your past habits, the device can guess your goal before you even finish your sentence. This allows the system to start your request sooner, fix mistakes proactively, and stop waiting for awkward silences. As a result, interacting with technology feels much faster and more natural, even if you hesitate or give a partial command.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Wang, Jinpeng, "Multimodal Intent Prediction via Retroactive Context Fusion and User Memory Priors", Technical Disclosure Commons, (May 13, 2026)
https://www.tdcommons.org/dpubs_series/10109

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Multimodal Intent Prediction via Retroactive Context Fusion and User Memory Priors

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Multimodal Intent Prediction via Retroactive Context Fusion and User Memory Priors

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information