Capturing geometric and material information from images remains a fundamental challenge in computer vision and graphics. Traditional optimization-based methods often require hours of computational time to reconstruct geometry, material properties, and environmental lighting from dense multi-view inputs, while still struggling with inherent ambiguities between lighting and material. On the other hand, learning-based approaches leverage rich material priors from existing 3D object datasets but face challenges with maintaining multi-view consistency. In this paper, we introduce IDArb, a diffusion-based model designed to perform intrinsic decomposition on an arbitrary number of images under varying illuminations. Our method achieves accurate and multi-view consistent estimation on surface normals and material properties. This is made possible through a novel cross-view, cross-domain attention module and an illumination-augmented, view-adaptive training strategy. Additionally, we introduce ARB-Objaverse, a new dataset that provides large-scale multi-view intrinsic data and renderings under diverse lighting conditions, supporting robust training. Extensive experiments demonstrate that IDArb outperforms state-of-the-art methods both qualitatively and quantitatively. Moreover, our approach facilitates a range of downstream tasks, including single-image relighting, photometric stereo, and 3D reconstruction, highlighting its broad applications in realistic 3D content creation.
翻译:从图像中捕获几何与材质信息始终是计算机视觉与图形学领域的核心挑战。传统基于优化的方法通常需要数小时计算时间才能从密集多视角输入中重建几何、材质属性与环境光照,且仍难以解决光照与材质之间的固有歧义性。另一方面,基于学习的方法虽能利用现有三维物体数据集中的丰富材质先验,但在保持多视角一致性方面面临挑战。本文提出IDArb——一种基于扩散的模型,旨在对任意数量、不同光照条件下的图像进行内在分解。通过新颖的跨视角跨域注意力模块与光照增强的自适应视角训练策略,我们的方法实现了精确且多视角一致的法线贴图与材质属性估计。此外,我们构建了ARB-Objaverse数据集,该数据集提供了大规模多视角内在属性数据及多样化光照条件下的渲染结果,为鲁棒训练提供支持。大量实验表明,IDArb在定性与定量评估中均优于现有先进方法。进一步地,本方法可有效支持单图像重照明、光度立体视觉、三维重建等下游任务,展现了其在逼真三维内容创作中的广泛应用前景。