Diffusion-based text-to-image models ignited immense attention from the vision community, artists, and content creators. Broad adoption of these models is due to significant improvement in the quality of generations and efficient conditioning on various modalities, not just text. However, lifting the rich generative priors of these 2D models into 3D is challenging. Recent works have proposed various pipelines powered by the entanglement of diffusion models and neural fields. We explore the power of pretrained 2D diffusion models and standard 3D neural radiance fields as independent, standalone tools and demonstrate their ability to work together in a non-learned fashion. Such modularity has the intrinsic advantage of eased partial upgrades, which became an important property in such a fast-paced domain. Our pipeline accepts any legacy renderable geometry, such as textured or untextured meshes, orchestrates the interaction between 2D generative refinement and 3D consistency enforcement tools, and outputs a painted input geometry in several formats. We conduct a large-scale study on a wide range of objects and categories from the ShapeNetSem dataset and demonstrate the advantages of our approach, both qualitatively and quantitatively. Project page: https://www.obukhov.ai/repainting_3d_assets
翻译:基于扩散的文本到图像模型引发了视觉领域、艺术家及内容创作者的极大关注。这些模型被广泛采用的原因在于其生成质量的显著提升,以及对文本之外多种模态的高效条件控制。然而,将这些2D模型的丰富生成先验提升至3D领域极具挑战性。近期研究提出了多种结合扩散模型与神经场的流水线方案。我们探索了预训练2D扩散模型与标准3D神经辐射场作为独立工具的能力,并证明了它们能以非学习方式协同工作。这种模块化设计具有便于部分升级的内在优势,在迭代迅速的领域成为重要特性。我们的流水线可接受任意传统可渲染几何体(如带纹理或未带纹理的网格),协调2D生成优化与3D一致性保持工具间的交互,并以多种格式输出着色后的输入几何体。我们在ShapeNetSem数据集的大规模对象及类别上开展研究,从定性与定量维度验证了本方法的优越性。项目页面:https://www.obukhov.ai/repainting_3d_assets