PointSeg: A Training-Free Paradigm for 3D Scene Segmentation via Foundation Models

Recent success of vision foundation models have shown promising performance for the 2D perception tasks. However, it is difficult to train a 3D foundation network directly due to the limited dataset and it remains under explored whether existing foundation models can be lifted to 3D space seamlessly. In this paper, we present PointSeg, a novel training-free paradigm that leverages off-the-shelf vision foundation models to address 3D scene perception tasks. PointSeg can segment anything in 3D scene by acquiring accurate 3D prompts to align their corresponding pixels across frames. Concretely, we design a two-branch prompts learning structure to construct the 3D point-box prompts pairs, combining with the bidirectional matching strategy for accurate point and proposal prompts generation. Then, we perform the iterative post-refinement adaptively when cooperated with different vision foundation models. Moreover, we design a affinity-aware merging algorithm to improve the final ensemble masks. PointSeg demonstrates impressive segmentation performance across various datasets, all without training. Specifically, our approach significantly surpasses the state-of-the-art specialist training-free model by 14.1$\%$, 12.3$\%$, and 12.6$\%$ mAP on ScanNet, ScanNet++, and KITTI-360 datasets, respectively. On top of that, PointSeg can incorporate with various foundation models and even surpasses the specialist training-based methods by 3.4$\%$-5.4$\%$ mAP across various datasets, serving as an effective generalist model.

翻译：视觉基础模型的最新成功在二维感知任务中展现出令人瞩目的性能。然而，由于数据集有限，直接训练三维基础网络十分困难，且现有基础模型能否无缝迁移至三维空间仍待探索。本文提出PointSeg，一种新颖的无训练范式，利用现成的视觉基础模型处理三维场景感知任务。PointSeg能够通过获取精确的三维提示来对齐跨帧的对应像素，从而分割三维场景中的任意物体。具体而言，我们设计了一种双分支提示学习结构来构建三维点-框提示对，并结合双向匹配策略以生成精确的点提示与提案提示。随后，在与不同视觉基础模型协作时，我们自适应地执行迭代后优化。此外，我们设计了一种亲和感知的融合算法以提升最终集成掩码的质量。PointSeg在多种数据集上均展现出卓越的分割性能，且全程无需训练。具体而言，我们的方法在ScanNet、ScanNet++和KITTI-360数据集上的mAP分别显著超越当前最先进的专业化无训练模型14.1$\%$、12.3$\%$和12.6$\%$。更重要的是，PointSeg能够兼容多种基础模型，甚至在多个数据集上以3.4$\%$-5.4$\%$的mAP优势超越基于训练的专业化方法，成为一种高效的通用模型。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日