Utilizing pre-trained 2D large-scale generative models, recent works are capable of generating high-quality novel views from a single in-the-wild image. However, due to the lack of information from multiple views, these works encounter difficulties in generating controllable novel views. In this paper, we present DreamComposer, a flexible and scalable framework that can enhance existing view-aware diffusion models by injecting multi-view conditions. Specifically, DreamComposer first uses a view-aware 3D lifting module to obtain 3D representations of an object from multiple views. Then, it renders the latent features of the target view from 3D representations with the multi-view feature fusion module. Finally the target view features extracted from multi-view inputs are injected into a pre-trained diffusion model. Experiments show that DreamComposer is compatible with state-of-the-art diffusion models for zero-shot novel view synthesis, further enhancing them to generate high-fidelity novel view images with multi-view conditions, ready for controllable 3D object reconstruction and various other applications.
翻译:借助预训练的二维大规模生成模型,近期研究能够从单张野外图像生成高质量的新视角。然而,由于缺乏多视角信息,这些方法在生成可控新视角时面临困难。本文提出DreamComposer——一种灵活且可扩展的框架,通过注入多视角条件增强现有视角感知扩散模型。具体而言,DreamComposer首先利用视角感知三维提升模块,从多视角获取物体的三维表征;随后通过多视角特征融合模块,从三维表征中渲染目标视角的潜在特征;最后将多视角输入中提取的目标视角特征注入预训练扩散模型。实验表明,DreamComposer可与最先进的扩散模型兼容,用于零样本新视角合成,进一步通过多视角条件生成高保真新视角图像,从而支持可控三维物体重建及其他多种应用。