MaterialSeg3D: Segmenting Dense Materials from 2D Priors for 3D Assets

Driven by powerful image diffusion models, recent research has achieved the automatic creation of 3D objects from textual or visual guidance. By performing score distillation sampling (SDS) iteratively across different views, these methods succeed in lifting 2D generative prior to the 3D space. However, such a 2D generative image prior bakes the effect of illumination and shadow into the texture. As a result, material maps optimized by SDS inevitably involve spurious correlated components. The absence of precise material definition makes it infeasible to relight the generated assets reasonably in novel scenes, which limits their application in downstream scenarios. In contrast, humans can effortlessly circumvent this ambiguity by deducing the material of the object from its appearance and semantics. Motivated by this insight, we propose MaterialSeg3D, a 3D asset material generation framework to infer underlying material from the 2D semantic prior. Based on such a prior model, we devise a mechanism to parse material in 3D space. We maintain a UV stack, each map of which is unprojected from a specific viewpoint. After traversing all viewpoints, we fuse the stack through a weighted voting scheme and then employ region unification to ensure the coherence of the object parts. To fuel the learning of semantics prior, we collect a material dataset, named Materialized Individual Objects (MIO), which features abundant images, diverse categories, and accurate annotations. Extensive quantitative and qualitative experiments demonstrate the effectiveness of our method.

翻译：受强大的图像扩散模型驱动，近期研究已实现从文本或视觉引导自动创建3D对象。通过在多个视角迭代执行分数蒸馏采样（SDS），这些方法成功将2D生成先验提升至3D空间。然而，此类2D生成图像先验将光照和阴影效果固化到纹理中，导致经SDS优化的材质图不可避免地引入伪相关成分。缺乏精确材质定义使得生成资产无法在新型场景中合理重光照，限制了其在下游场景中的应用。反观人类，能通过物体外观与语义推断材质，从容规避此歧义。受此启发，我们提出MaterialSeg3D——一种从2D语义先验推断隐含材质的3D资产材质生成框架。基于该先验模型，我们设计了3D空间材质解析机制：维护一个UV栈，其中每个贴图从特定视角反向投影得到；遍历所有视角后，通过加权投票方案融合该栈，并采用区域一致性约束确保物体部件连贯性。为支撑语义先验学习，我们构建了名为“具物质化个体对象（MIO）”的材质数据集，该数据集包含丰富图像、多样类别与精准标注。大量定量与定性实验证明了我们方法的有效性。

相关内容

ASSETS

关注 0

ACM SIGACCESS Conference on Computers and Accessibility是为残疾人和老年人提供与计算机相关的设计、评估、使用和教育研究的首要论坛。我们欢迎提交原始的高质量的有关计算和可访问性的主题。今年，ASSETS首次将其范围扩大到包括关于计算机无障碍教育相关主题的原创高质量研究。官网链接：http://assets19.sigaccess.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日