We introduce VividDream, a method for generating explorable 4D scenes with ambient dynamics from a single input image or text prompt. VividDream first expands an input image into a static 3D point cloud through iterative inpainting and geometry merging. An ensemble of animated videos is then generated using video diffusion models with quality refinement techniques and conditioned on renderings of the static 3D scene from the sampled camera trajectories. We then optimize a canonical 4D scene representation using an animated video ensemble, with per-video motion embeddings and visibility masks to mitigate inconsistencies. The resulting 4D scene enables free-view exploration of a 3D scene with plausible ambient scene dynamics. Experiments demonstrate that VividDream can provide human viewers with compelling 4D experiences generated based on diverse real images and text prompts.
翻译:我们提出VividDream,一种从单张输入图像或文本提示生成具有环境动态的可探索4D场景的方法。VividDream首先通过迭代修复和几何融合将输入图像扩展为静态三维点云。随后,基于从采样相机轨迹渲染的静态三维场景,利用视频扩散模型配合质量优化技术生成一组动态视频序列。我们使用该动态视频集合优化规范4D场景表示,通过逐视频运动嵌入和可见性掩码来缓解不一致性。最终得到的4D场景支持对具有合理环境动态的三维场景进行自由视角探索。实验表明,VividDream能够基于多样化的真实图像和文本提示,为人类观察者提供具有沉浸感的4D体验。