Driven by powerful image diffusion models, recent research has achieved the automatic creation of 3D objects from textual or visual guidance. By performing score distillation sampling (SDS) iteratively across different views, these methods succeed in lifting 2D generative prior to the 3D space. However, such a 2D generative image prior bakes the effect of illumination and shadow into the texture. As a result, material maps optimized by SDS inevitably involve spurious correlated components. The absence of precise material definition makes it infeasible to relight the generated assets reasonably in novel scenes, which limits their application in downstream scenarios. In contrast, humans can effortlessly circumvent this ambiguity by deducing the material of the object from its appearance and semantics. Motivated by this insight, we propose MaterialSeg3D, a 3D asset material generation framework to infer underlying material from the 2D semantic prior. Based on such a prior model, we devise a mechanism to parse material in 3D space. We maintain a UV stack, each map of which is unprojected from a specific viewpoint. After traversing all viewpoints, we fuse the stack through a weighted voting scheme and then employ region unification to ensure the coherence of the object parts. To fuel the learning of semantics prior, we collect a material dataset, named Materialized Individual Objects (MIO), which features abundant images, diverse categories, and accurate annotations. Extensive quantitative and qualitative experiments demonstrate the effectiveness of our method.
翻译:受强大的图像扩散模型驱动,近期研究已实现从文本或视觉引导自动创建3D对象。通过在多个视角迭代执行分数蒸馏采样(SDS),这些方法成功将2D生成先验提升至3D空间。然而,此类2D生成图像先验将光照和阴影效果固化到纹理中,导致经SDS优化的材质图不可避免地引入伪相关成分。缺乏精确材质定义使得生成资产无法在新型场景中合理重光照,限制了其在下游场景中的应用。反观人类,能通过物体外观与语义推断材质,从容规避此歧义。受此启发,我们提出MaterialSeg3D——一种从2D语义先验推断隐含材质的3D资产材质生成框架。基于该先验模型,我们设计了3D空间材质解析机制:维护一个UV栈,其中每个贴图从特定视角反向投影得到;遍历所有视角后,通过加权投票方案融合该栈,并采用区域一致性约束确保物体部件连贯性。为支撑语义先验学习,我们构建了名为“具物质化个体对象(MIO)”的材质数据集,该数据集包含丰富图像、多样类别与精准标注。大量定量与定性实验证明了我们方法的有效性。