Semantic, instance, and panoptic segmentation of 3D point clouds have been addressed using task-specific models of distinct design. Thereby, the similarity of all segmentation tasks and the implicit relationship between them have not been utilized effectively. This paper presents a unified, simple, and effective model addressing all these tasks jointly. The model, named OneFormer3D, performs instance and semantic segmentation consistently, using a group of learnable kernels, where each kernel is responsible for generating a mask for either an instance or a semantic category. These kernels are trained with a transformer-based decoder with unified instance and semantic queries passed as an input. Such a design enables training a model end-to-end in a single run, so that it achieves top performance on all three segmentation tasks simultaneously. Specifically, our OneFormer3D ranks 1st and sets a new state-of-the-art (+2.1 mAP50) in the ScanNet test leaderboard. We also demonstrate the state-of-the-art results in semantic, instance, and panoptic segmentation of ScanNet (+21 PQ), ScanNet200 (+3.8 mAP50), and S3DIS (+0.8 mIoU) datasets.
翻译:语义、实例与全景分割3D点云的任务通常采用各自独立设计的专用模型实现,这未能有效利用所有分割任务之间的相似性及其隐含关系。本文提出一个统一、简洁且高效的模型,可同时解决上述所有任务。该模型名为OneFormer3D,通过一组可学习核(每个核负责为某一实例或语义类别生成掩码)实现实例与语义分割的一致性。这些核由基于Transformer的解码器训练,该解码器将统一的实例与语义查询作为输入输入。这种设计使得模型能够通过单次训练实现端到端学习,从而同步在所有三个分割任务上达到顶尖性能。具体而言,我们的OneFormer3D在ScanNet测试排行榜上排名第一并刷新了最新水平(mAP50提升+2.1)。我们还在ScanNet (+21 PQ)、ScanNet200 (+3.8 mAP50) 和S3DIS (+0.8 mIoU) 数据集上的语义、实例与全景分割中展现了最先进结果。