Decomposing 3D assets into material parts is a common task for artists, yet remains a highly manual process. In this work, we introduce Select Any Material (SAMa), a material selection approach for in-the-wild objects in arbitrary 3D representations. Building on SAM2's video prior, we construct a material-centric video dataset that extends it to the material domain. We propose an efficient way to lift the model's 2D predictions to 3D by projecting each view into an intermediary 3D point cloud using depth. Nearest-neighbor lookups between any 3D representation and this similarity point cloud allow us to efficiently reconstruct accurate selection masks over objects' surfaces that can be inspected from any view. Our method is multiview-consistent by design, alleviating the need for costly per-asset optimization, and performs optimization-free selection in seconds. SAMa outperforms several strong baselines in selection accuracy and multiview consistency and enables various compelling applications, such as replacing the diffuse-textured materials on a text-to-3D output with PBR materials or selecting and editing materials on NeRFs and 3DGS captures.
翻译:将三维资产分解为材料部件是艺术家的常见任务,但目前仍高度依赖人工操作。本研究提出Select Any Material (SAMa),一种适用于任意三维表示形式中真实物体的材料选择方法。基于SAM2的视频先验,我们构建了一个以材料为中心的视频数据集,将其扩展至材料领域。我们提出一种将模型二维预测高效提升至三维的方法:利用深度信息将每个视图投影至中间三维点云。通过任意三维表示与该相似性点云之间的最近邻查找,我们能够高效重建物体表面上的精确选择掩码,并可从任意视角进行检视。本方法在设计上具备多视图一致性,无需昂贵的逐资产优化,并可在数秒内完成无需优化的选择。SAMa在选择准确性和多视图一致性方面优于多个强基线,并支持多种引人注目的应用,例如将文本到三维输出中的漫反射纹理材料替换为PBR材料,或对NeRF和3DGS捕获结果进行材料选择与编辑。