Recent advances in 3D generation have been remarkable, with methods such as DreamFusion leveraging large-scale text-to-image diffusion-based models to guide 3D object generation. These methods enable the synthesis of detailed and photorealistic textured objects. However, the appearance of 3D objects produced by such text-to-3D models is often unpredictable, and it is hard for single-image-to-3D methods to deal with images lacking a clear subject, complicating the generation of appearance-controllable 3D objects from complex images. To address these challenges, we present IPDreamer, a novel method that captures intricate appearance features from complex $\textbf{I}$mage $\textbf{P}$rompts and aligns the synthesized 3D object with these extracted features, enabling high-fidelity, appearance-controllable 3D object generation. Our experiments demonstrate that IPDreamer consistently generates high-quality 3D objects that align with both the textual and complex image prompts, highlighting its promising capability in appearance-controlled, complex 3D object generation. Our code is available at https://github.com/zengbohan0217/IPDreamer.
翻译:近年来,三维生成领域取得了显著进展,诸如DreamFusion等方法利用基于大规模文本到图像扩散的模型来指导三维物体生成。这些方法能够合成细节丰富、具有照片级真实感的纹理化物体。然而,此类文本到三维模型所生成物体的外观往往难以预测,且单图像到三维方法难以处理缺乏清晰主体的图像,这使得从复杂图像生成外观可控的三维物体变得复杂。为应对这些挑战,我们提出了IPDreamer,这是一种新颖的方法,能够从复杂的$\textbf{I}$mage $\textbf{P}$rompts(图像提示)中捕捉精细的外观特征,并将合成的三维物体与这些提取的特征对齐,从而实现高保真、外观可控的三维物体生成。我们的实验表明,IPDreamer能够持续生成与文本及复杂图像提示均对齐的高质量三维物体,凸显了其在复杂三维物体外观可控生成方面的强大潜力。我们的代码公开于https://github.com/zengbohan0217/IPDreamer。