Defensive Publications Series

LATENT TEMPORAL REASONING IN MULTIMODAL LLMS USING SPECIALIZED TIME-GAP TOKENS AND SPATIOTEMPORAL SALIENCY PRUNING

Abstract

This disclosure describes a system for helping Artificial Intelligence (AI) models , such as foundation models, generative AI, or large language models (LLMs), watch and understand continuous video streams or non-stop broadcasts without exceeding their context window or losing track of time. Normally, AI models have a limited memory (context window) and can suffer from temporal hallucination, becoming confused about when events occurred if large portions of an uninterrupted video are skipped. The system solves temporal hallucinations by using a dynamic saliency transformer acting as a smart filter to remove boring, low saliency, or repetitive parts of a video based on an auto-calibrating reference point. To keep the timeline accurate, the system inserts special time-gap tokens into the data stream. These tokens act as digital bookmarks, time lags, or temporal offsets that tell the neural network exactly how much time passed during the skipped sections. Time-gap tokens allow the model to analyze long video streams efficiently while maintaining a precise understanding of the temporal distance and total duration of the event.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Yakar, Tamar and Labzovsky, Ilia, "LATENT TEMPORAL REASONING IN MULTIMODAL LLMS USING SPECIALIZED TIME-GAP TOKENS AND SPATIOTEMPORAL SALIENCY PRUNING", Technical Disclosure Commons, (June 18, 2026)
https://www.tdcommons.org/dpubs_series/10516

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

LATENT TEMPORAL REASONING IN MULTIMODAL LLMS USING SPECIALIZED TIME-GAP TOKENS AND SPATIOTEMPORAL SALIENCY PRUNING

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

LATENT TEMPORAL REASONING IN MULTIMODAL LLMS USING SPECIALIZED TIME-GAP TOKENS AND SPATIOTEMPORAL SALIENCY PRUNING

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information