Abstract
Automated video reformatting for displays, such as vertical displays, can present challenges for conventional techniques that may have difficulty interpreting narrative context or artistic composition, potentially resulting in diminished visual continuity. This disclosure describes a feedback-driven multi-LLM pipeline that can use multiple specialized large language models (LLMs) in a coordinated workflow. The system can employ a multi-stage process where distinct LLMs may analyze a video for semantic meaning, generate a configurable frame-level crop plan, execute the crop, for example, with smooth motion, and inspect the output for potential flaws. If a potential defect is identified, a quality assurance model can provide specific feedback to a planning model, which may trigger an iterative self-correction loop. This method can provide a semantically aware video transformation to aid in preserving key subjects, compositional quality, and temporal consistency when cropping content for different aspect ratios.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Bakkali, Wafae and Kuligin, Leonid, "Feedback-Driven Multi-LLM Pipeline Using LLMs for Semantic Video Cropping", Technical Disclosure Commons, ()
https://www.tdcommons.org/dpubs_series/9776