Defensive Publications Series

Dynamic Frame Sampling for Multimodal Large Language Model Video Understanding

Beltrán LabradorFollow
Georgi StephanovFollow
Yury StukenFollow
Fabio SulserFollow
Ilia AkolzinFollow
Olivier SiegenthalerFollow
Ágoston WeiszFollow
Manuel TragutFollow

Abstract

Video understanding using multimodal large language models (LLMs) includes recognizing objects, actions, scenes, and extracting meaningful insights from video streams. To reduce computational burden, a subset of frames sampled from the video are fed to the LLM. However, this can reduce the accuracy of LLM inference when the video includes salient information in a small set of frames and is wasteful when the video has slow-moving scenes. This disclosure describes a dynamic subsampling technique that can be used to select the most salient frames with a higher likelihood. Specifically, attention-guided frame selection, 3D convolutional feature extraction, and entropy-based subspace projection are utilized to ensure that the most important information from the video is fed to the LLM. The techniques reduce the number of frames to be processed by LLM compared to fixed frame rate sampling while also improving inference accuracy.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Labrador, Beltrán; Stephanov, Georgi; Stuken, Yury; Sulser, Fabio; Akolzin, Ilia; Siegenthaler, Olivier; Weisz, Ágoston; and Tragut, Manuel, "Dynamic Frame Sampling for Multimodal Large Language Model Video Understanding", Technical Disclosure Commons, (May 08, 2025)
https://www.tdcommons.org/dpubs_series/8101

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Dynamic Frame Sampling for Multimodal Large Language Model Video Understanding

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Dynamic Frame Sampling for Multimodal Large Language Model Video Understanding

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information