We introduce Open3DIS, a novel solution designed to tackle the problem of Open-Vocabulary Instance Segmentation within 3D scenes. Objects within 3D environments exhibit diverse shapes, scales, and colors, making precise instance-level identification a challenging task. Recent advancements in Open-Vocabulary scene understanding have made significant strides in this area by employing class-agnostic 3D instance proposal networks for object localization and learning queryable features for each 3D mask. While these methods produce high-quality instance proposals, they struggle with identifying small-scale and geometrically ambiguous objects. The key idea of our method is a new module that aggregates 2D instance masks across frames and maps them to geometrically coherent point cloud regions as high-quality object proposals addressing the above limitations. These are then combined with 3D class-agnostic instance proposals to include a wide range of objects in the real world. To validate our approach, we conducted experiments on three prominent datasets, including ScanNet200, S3DIS, and Replica, demonstrating significant performance gains in segmenting objects with diverse categories over the state-of-the-art approaches.
翻译:我们提出了Open3DIS,一种旨在解决3D场景中开放词汇实例分割问题的新型解决方案。3D环境中的物体展现出多样的形状、尺度与颜色,使得精确的实例级识别成为一项挑战性任务。近期,开放词汇场景理解领域的进展通过采用类别无关的3D实例提案网络进行物体定位,并为每个3D掩码学习可查询特征,在这一方向上取得了显著突破。尽管这些方法能生成高质量的实例提案,但在识别小尺度及几何模糊物体时仍存在困难。我们方法的核心思想是设计一个新模块,该模块聚合跨帧的2D实例掩码并将其映射为几何连贯的点云区域,作为高质量物体提案,从而克服上述局限。随后,这些提案与3D类别无关的实例提案相结合,以涵盖真实世界中的各类物体。为验证方法的有效性,我们在包括ScanNet200、S3DIS和Replica在内的三个主流数据集上进行了实验,结果表明,在分割多种类别的物体时,我们的方法相较于现有最优技术取得了显著的性能提升。