4D content generation has achieved remarkable progress recently. However, existing methods suffer from long optimization times, a lack of motion controllability, and a low quality of details. In this paper, we introduce DreamGaussian4D (DG4D), an efficient 4D generation framework that builds on Gaussian Splatting (GS). Our key insight is that combining explicit modeling of spatial transformations with static GS makes an efficient and powerful representation for 4D generation. Moreover, video generation methods have the potential to offer valuable spatial-temporal priors, enhancing the high-quality 4D generation. Specifically, we propose an integral framework with two major modules: 1) Image-to-4D GS - we initially generate static GS with DreamGaussianHD, followed by HexPlane-based dynamic generation with Gaussian deformation; and 2) Video-to-Video Texture Refinement - we refine the generated UV-space texture maps and meanwhile enhance their temporal consistency by utilizing a pre-trained image-to-video diffusion model. Notably, DG4D reduces the optimization time from several hours to just a few minutes, allows the generated 3D motion to be visually controlled, and produces animated meshes that can be realistically rendered in 3D engines.
翻译:4D内容生成近期取得了显著进展。然而,现有方法存在优化时间长、运动可控性差以及细节质量低等问题。本文提出DreamGaussian4D(DG4D),一种基于高斯泼溅(GS)的高效4D生成框架。我们的核心洞察在于,将空间变换的显式建模与静态GS相结合,可为4D生成提供高效且强大的表示。此外,视频生成方法具有提供宝贵时空先验的潜力,可增强高质量4D生成。具体而言,我们提出一个包含两大模块的完整框架:1)图像到4D GS——首先通过DreamGaussianHD生成静态GS,随后结合六平面(HexPlane)的动态生成与高斯变形;2)视频到视频纹理精炼——利用预训练的图像到视频扩散模型精炼生成的UV空间纹理图,同时增强其时间一致性。值得注意的是,DG4D将优化时间从数小时缩短至数分钟,允许对生成的3D运动进行可视化控制,并生成可在3D引擎中真实渲染的动画网格。