Abstract

Video generation models can exhibit deficiencies in text-to-video alignment and instruction following, and some agentic systems may be limited by editing the initial user prompt, an approach that can be less effective when a detailed prompt expander is used. This disclosure describes an agent-based iterative refinement system that operates on machine-generated expanded prompts provided to a video model. The method can utilize a self-improvement loop where a critic component, for instance one or more large language models (LLMs), analyzes a generated video. Another LLM can then perform targeted edits on the expanded prompt to address identified flaws, and a selection process, such as a pairwise tournament, selects a prompt for the next refinement cycle. This feedback mechanism may provide more granular control over the generation process, potentially improving video quality, alignment, and storytelling coherence without a need for model retraining.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS