The development of 2D foundation models for image segmentation has been significantly advanced by the Segment Anything Model (SAM). However, achieving similar success in 3D models remains a challenge due to issues such as non-unified data formats, lightweight models, and the scarcity of labeled data with diverse masks. To this end, we propose a 3D promptable segmentation model (Point-SAM) focusing on point clouds. Our approach utilizes a transformer-based method, extending SAM to the 3D domain. We leverage part-level and object-level annotations and introduce a data engine to generate pseudo labels from SAM, thereby distilling 2D knowledge into our 3D model. Our model outperforms state-of-the-art models on several indoor and outdoor benchmarks and demonstrates a variety of applications, such as 3D annotation. Codes and demo can be found at https://github.com/zyc00/Point-SAM.
翻译:二维图像分割基础模型的发展因Segment Anything Model (SAM)而取得显著进展。然而,在三维模型中实现类似的成功仍然面临挑战,这源于数据格式不统一、模型轻量化不足以及带多样化掩码的标注数据稀缺等问题。为此,我们提出了一种专注于点云的三维可提示分割模型(Point-SAM)。我们的方法采用基于Transformer的架构,将SAM扩展至三维领域。我们利用部件级和物体级标注,并引入一个数据引擎从SAM生成伪标签,从而将二维知识蒸馏至我们的三维模型中。我们的模型在多个室内外基准测试中超越了现有最优模型,并展示了三维标注等多种应用。代码与演示可见于 https://github.com/zyc00/Point-SAM。