Defensive Publications Series

Feedback-Driven Multi-LLM Pipeline for Semantic Video Cropping

Abstract

Automated video reformatting for displays, such as vertical displays, can present challenges for conventional techniques that may have difficulty interpreting narrative context or artistic composition, potentially resulting in diminished visual continuity. This disclosure describes a feedback-driven multi-LLM pipeline that can use multiple specialized large language models (LLMs) in a coordinated workflow. The system can employ a multi-stage process where distinct LLMs may analyze a video for semantic meaning, generate a configurable frame-level crop plan, execute the crop, for example, with smooth motion, and inspect the output for potential flaws. If a potential defect is identified, a quality assurance model can provide specific feedback to a planning model, which may trigger an iterative self-correction loop. This method can provide a semantically aware video transformation to aid in preserving key subjects, compositional quality, and temporal consistency when cropping content for different aspect ratios.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Bakkali, Wafae and Kuligin, Leonid, "Feedback-Driven Multi-LLM Pipeline for Semantic Video Cropping", Technical Disclosure Commons, (April 10, 2026)
https://www.tdcommons.org/dpubs_series/9776

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Feedback-Driven Multi-LLM Pipeline for Semantic Video Cropping

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Feedback-Driven Multi-LLM Pipeline for Semantic Video Cropping

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information