Abstract
Video generation models can exhibit deficiencies in text-to-video alignment and instruction following, and some agentic systems may be limited by editing the initial user prompt, an approach that can be less effective when a detailed prompt expander is used. This disclosure describes an agent-based iterative refinement system that operates on machine-generated expanded prompts provided to a video model. The method can utilize a self-improvement loop where a critic component, for instance one or more large language models (LLMs), analyzes a generated video. Another LLM can then perform targeted edits on the expanded prompt to address identified flaws, and a selection process, such as a pairwise tournament, selects a prompt for the next refinement cycle. This feedback mechanism may provide more granular control over the generation process, potentially improving video quality, alignment, and storytelling coherence without a need for model retraining.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Weisz, Ágoston; Vernikos, Giorgos; Bulski, Aleksander; Qian, Yanan; Dayanc, Mertay; and Liba, Orly, "Agent-Based Iterative Refinement of Expanded Prompts for Video Generation", Technical Disclosure Commons, ()
https://www.tdcommons.org/dpubs_series/9846