Diffusion-based text-to-image models ignited immense attention from the vision community, artists, and content creators. Broad adoption of these models is due to significant improvement in the quality of generations and efficient conditioning on various modalities, not just text. However, lifting the rich generative priors of these 2D models into 3D is challenging. Recent works have proposed various pipelines powered by the entanglement of diffusion models and neural fields. We explore the power of pretrained 2D diffusion models and standard 3D neural radiance fields as independent, standalone tools and demonstrate their ability to work together in a non-learned fashion. Such modularity has the intrinsic advantage of eased partial upgrades, which became an important property in such a fast-paced domain. Our pipeline accepts any legacy renderable geometry, such as textured or untextured meshes, orchestrates the interaction between 2D generative refinement and 3D consistency enforcement tools, and outputs a painted input geometry in several formats. We conduct a large-scale study on a wide range of objects and categories from the ShapeNetSem dataset and demonstrate the advantages of our approach, both qualitatively and quantitatively. Project page: https://www.obukhov.ai/repainting_3d_assets
翻译:基于扩散的文本到图像模型引起了视觉社区、艺术家和内容创作者的极大关注。这些模型的广泛采用得益于生成质量的显著提升以及高效的多模态条件控制(不仅限于文本)。然而,将这类2D模型的丰富生成先验提升至3D领域仍具有挑战性。近期研究提出了多种结合扩散模型与神经场的流水线。我们探索了预训练2D扩散模型与标准3D神经辐射场作为独立工具的能力,并展示了它们以非学习方式协同工作的可能性。这种模块化设计天然具备便于局部升级的优势,在快速发展的领域中成为重要特性。我们的流水线接受任意可渲染几何体(如带纹理或无纹理网格),协调2D生成优化与3D一致性保持工具之间的交互,最终以多种格式输出已着色的输入几何体。我们在ShapeNetSem数据集的不同类别物体上进行了大规模研究,从定性和定量两方面证明了本方法的优势。项目页面:https://www.obukhov.ai/repainting_3d_assets