DreamSparse: Escaping from Plato's Cave with 2D Diffusion Model Given Sparse Views

Synthesizing novel view images from a few views is a challenging but practical problem. Existing methods often struggle with producing high-quality results or necessitate per-object optimization in such few-view settings due to the insufficient information provided. In this work, we explore leveraging the strong 2D priors in pre-trained diffusion models for synthesizing novel view images. 2D diffusion models, nevertheless, lack 3D awareness, leading to distorted image synthesis and compromising the identity. To address these problems, we propose DreamSparse, a framework that enables the frozen pre-trained diffusion model to generate geometry and identity-consistent novel view image. Specifically, DreamSparse incorporates a geometry module designed to capture 3D features from sparse views as a 3D prior. Subsequently, a spatial guidance model is introduced to convert these 3D feature maps into spatial information for the generative process. This information is then used to guide the pre-trained diffusion model, enabling it to generate geometrically consistent images without tuning it. Leveraging the strong image priors in the pre-trained diffusion models, DreamSparse is capable of synthesizing high-quality novel views for both object and scene-level images and generalising to open-set images. Experimental results demonstrate that our framework can effectively synthesize novel view images from sparse views and outperforms baselines in both trained and open-set category images. More results can be found on our project page: https://sites.google.com/view/dreamsparse-webpage.

翻译：从少量视图合成新视角图像是一项极具挑战性但具有实际意义的问题。现有方法在输入信息不足的情况下，难以生成高质量结果，或需对每个物体进行单独优化。本文探索利用预训练扩散模型中强大的2D先验知识来合成新视角图像。然而，2D扩散模型缺乏3D感知能力，导致生成图像扭曲并破坏物体身份一致性。针对这些问题，我们提出DreamSparse框架，使冻结的预训练扩散模型能够生成几何与身份一致的新视角图像。具体而言，DreamSparse包含一个几何模块，可从稀疏视图中提取3D特征作为3D先验；随后引入空间引导模型，将这些3D特征图转化为生成过程所需的空间信息。该信息用于引导预训练扩散模型，使其无需微调即可生成几何一致的图像。凭借预训练扩散模型的强图像先验，DreamSparse能合成高质量的对象级与场景级新视角图像，并泛化至开放类别图像。实验结果表明，本方法可有效从稀疏视图合成新视角图像，并在训练类别与开放类别图像上均优于基线方法。更多结果请访问项目页面：https://sites.google.com/view/dreamsparse-webpage。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/