Recently, diffusion-based video generation models have achieved significant success. However, existing models often suffer from issues like weak consistency and declining image quality over time. To overcome these challenges, inspired by aesthetic principles, we propose a non-invasive plug-in called Uniform Frame Organizer (UFO), which is compatible with any diffusion-based video generation model. The UFO comprises a series of adaptive adapters with adjustable intensities, which can significantly enhance the consistency between the foreground and background of videos and improve image quality without altering the original model parameters when integrated. The training for UFO is simple, efficient, requires minimal resources, and supports stylized training. Its modular design allows for the combination of multiple UFOs, enabling the customization of personalized video generation models. Furthermore, the UFO also supports direct transferability across different models of the same specification without the need for specific retraining. The experimental results indicate that UFO effectively enhances video generation quality and demonstrates its superiority in public video generation benchmarks. The code will be publicly available at https://github.com/Delong-liu-bupt/UFO.
翻译:近年来,基于扩散的视频生成模型取得了显著成功。然而,现有模型常面临时序一致性较弱、图像质量随时间下降等问题。为克服这些挑战,受美学原理启发,我们提出了一种非侵入式插件——均匀帧组织器(UFO),该插件兼容任何基于扩散的视频生成模型。UFO由一系列强度可调的自适应适配器组成,在集成时不改变原始模型参数的情况下,能显著增强视频前景与背景的一致性并提升图像质量。UFO的训练过程简单高效、资源需求极低,且支持风格化训练。其模块化设计允许多个UFO组合使用,从而支持定制个性化视频生成模型。此外,UFO还支持在同规格不同模型间直接迁移,无需针对性的重新训练。实验结果表明,UFO有效提升了视频生成质量,并在公开视频生成基准测试中展现了优越性。代码将在 https://github.com/Delong-liu-bupt/UFO 公开。