Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models

Despite tremendous progress in generating high-quality images using diffusion models, synthesizing a sequence of animated frames that are both photorealistic and temporally coherent is still in its infancy. While off-the-shelf billion-scale datasets for image generation are available, collecting similar video data of the same scale is still challenging. Also, training a video diffusion model is computationally much more expensive than its image counterpart. In this work, we explore finetuning a pretrained image diffusion model with video data as a practical solution for the video synthesis task. We find that naively extending the image noise prior to video noise prior in video diffusion leads to sub-optimal performance. Our carefully designed video noise prior leads to substantially better performance. Extensive experimental validation shows that our model, Preserve Your Own Correlation (PYoCo), attains SOTA zero-shot text-to-video results on the UCF-101 and MSR-VTT benchmarks. It also achieves SOTA video generation quality on the small-scale UCF-101 benchmark with a $10\times$ smaller model using significantly less computation than the prior art.

翻译：尽管利用扩散模型生成高质量图像取得了巨大进展，但合成既逼真又时间一致性的动画帧序列仍处于起步阶段。虽然存在现成的数十亿规模图像生成数据集，但收集同等规模的视频数据仍具挑战。此外，训练视频扩散模型的计算成本远高于图像扩散模型。本研究探索了一种实用解决方案：使用视频数据微调预训练图像扩散模型以完成视频合成任务。我们发现，直接将图像噪声先验扩展到视频扩散中的视频噪声先验会导致次优性能。我们精心设计的视频噪声先验显著提升了性能。大量实验验证表明，我们的模型Preserve Your Own Correlation (PYoCo) 在UCF-101和MSR-VTT基准测试中实现了零样本文本生成视频的最新结果。同时，该模型在小型UCF-101基准测试中，以缩小10倍的模型规模和显著低于现有技术的计算量，达到了视频生成质量的最新水平。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日