4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency

Aided by text-to-image and text-to-video diffusion models, existing 4D content creation pipelines utilize score distillation sampling to optimize the entire dynamic 3D scene. However, as these pipelines generate 4D content from text or image inputs, they incur significant time and effort in prompt engineering through trial and error. This work introduces 4DGen, a novel, holistic framework for grounded 4D content creation that decomposes the 4D generation task into multiple stages. We identify static 3D assets and monocular video sequences as key components in constructing the 4D content. Our pipeline facilitates conditional 4D generation, enabling users to specify geometry (3D assets) and motion (monocular videos), thus offering superior control over content creation. Furthermore, we construct our 4D representation using dynamic 3D Gaussians, which permits efficient, high-resolution supervision through rendering during training, thereby facilitating high-quality 4D generation. Additionally, we employ spatial-temporal pseudo labels on anchor frames, along with seamless consistency priors implemented through 3D-aware score distillation sampling and smoothness regularizations. Compared to existing baselines, our approach yields competitive results in faithfully reconstructing input signals and realistically inferring renderings from novel viewpoints and timesteps. Most importantly, our method supports grounded generation, offering users enhanced control, a feature difficult to achieve with previous methods. Project page: https://vita-group.github.io/4DGen/

翻译：借助文本到图像和文本到视频扩散模型，现有4D内容生成管线利用分数蒸馏采样优化整个动态3D场景。然而，这些管线从文本或图像输入生成4D内容时，需通过反复试验进行提示工程，耗费大量时间与精力。本文提出4DGen——一种用于有基4D内容生成的新型整体框架，将4D生成任务分解为多个阶段。我们识别出静态3D资产与单目视频序列是构建4D内容的关键组件。本管线支持条件式4D生成，使用户能够指定几何（3D资产）与运动（单目视频），从而对内容创作实现更优控制。此外，我们采用动态3D高斯构建4D表示，通过训练期间的渲染实现高效高分辨率监督，进而促进高质量4D生成。同时，我们对锚定帧施加时空伪标签，并借助基于3D感知的分数蒸馏采样与平滑正则化实现无缝一致性先验。与现有基线相比，本方法在忠实重建输入信号、从新视角与时间步真实推断渲染结果方面均取得具有竞争力的成果。最重要的是，本方法支持有基生成，为用户提供增强控制能力，这是先前方法难以实现的特性。项目页面：https://vita-group.github.io/4DGen/

相关内容

ASSETS

关注 0

ACM SIGACCESS Conference on Computers and Accessibility是为残疾人和老年人提供与计算机相关的设计、评估、使用和教育研究的首要论坛。我们欢迎提交原始的高质量的有关计算和可访问性的主题。今年，ASSETS首次将其范围扩大到包括关于计算机无障碍教育相关主题的原创高质量研究。官网链接：http://assets19.sigaccess.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日