Zero4D：基于现成视频扩散模型从单视频实现免训练的四维视频生成 (Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion Model)

Recently, multi-view or 4D video generation has emerged as a significant research topic. Nonetheless, recent approaches to 4D generation still struggle with fundamental limitations, as they primarily rely on harnessing multiple video diffusion models with additional training or compute-intensive training of a full 4D diffusion model with limited real-world 4D data and large computational costs. To address these challenges, here we propose the first training-free 4D video generation method that leverages the off-the-shelf video diffusion models to generate multi-view videos from a single input video. Our approach consists of two key steps: (1) By designating the edge frames in the spatio-temporal sampling grid as key frames, we first synthesize them using a video diffusion model, leveraging a depth-based warping technique for guidance. This approach ensures structural consistency across the generated frames, preserving spatial and temporal coherence. (2) We then interpolate the remaining frames using a video diffusion model, constructing a fully populated and temporally coherent sampling grid while preserving spatial and temporal consistency. Through this approach, we extend a single video into a multi-view video along novel camera trajectories while maintaining spatio-temporal consistency. Our method is training-free and fully utilizes an off-the-shelf video diffusion model, offering a practical and effective solution for multi-view video generation.

翻译：近年来，多视角或四维视频生成已成为重要的研究课题。然而，现有的四维生成方法仍受限于根本性挑战，主要依赖于通过额外训练协调多个视频扩散模型，或利用有限真实世界四维数据、以高昂计算成本训练完整的四维扩散模型。为应对这些挑战，本文提出首个免训练的四维视频生成方法，利用现成的视频扩散模型从单个输入视频生成多视角视频。我们的方法包含两个关键步骤：（1）通过将时空采样网格中的边缘帧指定为关键帧，首先使用视频扩散模型结合基于深度的形变引导技术合成这些关键帧。该方法确保生成帧间的结构一致性，保持空间与时间连贯性。（2）随后利用视频扩散模型插值剩余帧，构建完全填充且时序连贯的采样网格，同时保持空间与时间一致性。通过此方法，我们将单视频沿新相机轨迹扩展为多视角视频，并维持时空一致性。本方法完全免训练且充分利用现成视频扩散模型，为多视角视频生成提供了实用而有效的解决方案。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日