Abstract
This publication describes techniques for reducing shot cuts in videos generated by diffusion models, particularly in image-to-video applications. Shot cuts are abrupt transitions between distinct scenes of video. Shot cuts may be unexpected and may be irrelevant to provided input. To reduce shot cuts, a diffusion model may begin a diffusion process from a static video, where the frames of the static video may be noised instances of an input image. The diffusion model may also insert the input image as a first frame of the generated video, anchoring the output to the input. By ‘anchoring’ the output video and leveraging concepts from Stochastic Differential Equation editing (SDEdit), the techniques may stabilize video generation and improve consistency with the initial prompt. Additionally, the diffusion model may generate the video based on various input parameters (e.g., hyperparameters specifying noise, an amount of denoising steps, and/or a distribution of denoising steps). These parameters enable the user to exert greater control over the course of the diffusion process and ultimately fine-tune the generated video. In this way, the techniques described with respect to this disclosure may reduce shot cuts and other unexpected output, enabling diffusion models to produce videos more relevant to the provided input.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
McIntire, Mitchell; Yao, David; and Liba, Orly, "REDUCING SHOT CUTS IN DIFFUSION-GENERATED VIDEO", Technical Disclosure Commons, (August 01, 2025)
https://www.tdcommons.org/dpubs_series/8422