Existing 2D methods utilize UNet-based diffusion models to generate multi-view physically-based rendering (PBR) maps but struggle with multi-view inconsistency, while some 3D methods directly generate UV maps, encountering generalization issues due to the limited 3D data. To address these problems, we propose a two-stage approach, including multi-view generation and UV materials refinement. In the generation stage, we adopt a Diffusion Transformer (DiT) model to generate PBR materials, where both the specially designed multi-branch DiT and reference-based DiT blocks adopt a global attention mechanism to promote feature interaction and fusion between different views, thereby improving multi-view consistency. In addition, we adopt a PBR-based diffusion loss to ensure that the generated materials align with realistic physical principles. In the refinement stage, we propose a material-refined DiT that performs inpainting in empty areas and enhances details in UV space. Except for the normal condition, this refinement also takes the material map from the generation stage as an additional condition to reduce the learning difficulty and improve generalization. Extensive experiments show that our method achieves state-of-the-art performance in texturing 3D objects with PBR materials and provides significant advantages for graphics relighting applications. Project Page: https://lingtengqiu.github.io/2024/MCMat/
翻译:现有的二维方法利用基于UNet的扩散模型生成多视角基于物理的渲染(PBR)贴图,但难以保证多视角一致性;而一些三维方法直接生成UV贴图,由于三维数据有限而面临泛化问题。为解决这些问题,我们提出了一种两阶段方法,包括多视角生成和UV材质细化。在生成阶段,我们采用扩散Transformer(DiT)模型生成PBR材质,其中特别设计的多分支DiT和基于参考的DiT模块均采用全局注意力机制,以促进不同视角间的特征交互与融合,从而提升多视角一致性。此外,我们采用基于PBR的扩散损失,确保生成的材质符合真实的物理原理。在细化阶段,我们提出了一种材质细化DiT,在UV空间中对空白区域进行修复并增强细节。除法线条件外,该细化过程还将生成阶段得到的材质贴图作为额外条件,以降低学习难度并提高泛化能力。大量实验表明,我们的方法在使用PBR材质为三维对象贴图方面达到了最先进的性能,并为图形重光照应用提供了显著优势。项目页面:https://lingtengqiu.github.io/2024/MCMat/