We propose UniSeg3D, a unified 3D segmentation framework that achieves panoptic, semantic, instance, interactive, referring, and open-vocabulary semantic segmentation tasks within a single model. Most previous 3D segmentation approaches are specialized for a specific task, thereby limiting their understanding of 3D scenes to a task-specific perspective. In contrast, the proposed method unifies six tasks into unified representations processed by the same Transformer. It facilitates inter-task knowledge sharing and, therefore, promotes comprehensive 3D scene understanding. To take advantage of multi-task unification, we enhance the performance by leveraging task connections. Specifically, we design a knowledge distillation method and a contrastive learning method to transfer task-specific knowledge across different tasks. Benefiting from extensive inter-task knowledge sharing, our UniSeg3D becomes more powerful. Experiments on three benchmarks, including the ScanNet20, ScanRefer, and ScanNet200, demonstrate that the UniSeg3D consistently outperforms current SOTA methods, even those specialized for individual tasks. We hope UniSeg3D can serve as a solid unified baseline and inspire future work. The code will be available at https://dk-liang.github.io/UniSeg3D/.
翻译:我们提出了UniSeg3D,一个统一的三维分割框架,能够在单一模型中实现全景、语义、实例、交互式、指代式和开放词汇语义分割任务。以往大多数三维分割方法专用于特定任务,从而将其对三维场景的理解局限于任务特定视角。相比之下,所提方法将六项任务统一为由相同Transformer处理的统一表示。该方法促进了任务间知识共享,从而推动全面的三维场景理解。为充分利用多任务统一优势,我们通过挖掘任务关联性来提升性能。具体而言,我们设计了知识蒸馏方法和对比学习方法,以实现跨不同任务的任务特定知识迁移。得益于广泛的任务间知识共享,我们的UniSeg3D变得更为强大。在ScanNet20、ScanRefer和ScanNet200三个基准数据集上的实验表明,UniSeg3D始终优于当前最先进方法,甚至包括那些专为单个任务设计的方法。我们希望UniSeg3D能够成为坚实的统一基线,并启发未来研究。代码将在https://dk-liang.github.io/UniSeg3D/发布。