Position-Guided Point Cloud Panoptic Segmentation Transformer

DEtection TRansformer (DETR) started a trend that uses a group of learnable queries for unified visual perception. This work begins by applying this appealing paradigm to LiDAR-based point cloud segmentation and obtains a simple yet effective baseline. Although the naive adaptation obtains fair results, the instance segmentation performance is noticeably inferior to previous works. By diving into the details, we observe that instances in the sparse point clouds are relatively small to the whole scene and often have similar geometry but lack distinctive appearance for segmentation, which are rare in the image domain. Considering instances in 3D are more featured by their positional information, we emphasize their roles during the modeling and design a robust Mixed-parameterized Positional Embedding (MPE) to guide the segmentation process. It is embedded into backbone features and later guides the mask prediction and query update processes iteratively, leading to Position-Aware Segmentation (PA-Seg) and Masked Focal Attention (MFA). All these designs impel the queries to attend to specific regions and identify various instances. The method, named Position-guided Point cloud Panoptic segmentation transFormer (P3Former), outperforms previous state-of-the-art methods by 3.4% and 1.2% PQ on SemanticKITTI and nuScenes benchmark, respectively. The source code and models are available at https://github.com/SmartBot-PJLab/P3Former .

翻译：DEtection TRansformer（DETR）开创了使用一组可学习查询实现统一视觉感知的趋势。本研究首先将此范式应用于基于LiDAR的点云分割，构建了一个简单而有效的基线模型。尽管直接适配取得了尚可的结果，但实例分割性能明显低于先前方法。深入分析后，我们观察到稀疏点云中的实例相对整个场景尺度较小，且常具有相似的几何结构但缺乏可用于分割的显著外观特征——这在图像域中较为少见。考虑到3D实例更依赖其位置信息进行特征表达，我们强调位置信息在建模过程中的作用，设计了一种鲁棒的混合参数化位置嵌入（MPE）来引导分割过程。该嵌入被融入骨干网络特征中，随后迭代引导掩码预测和查询更新过程，由此提出位置感知分割（PA-Seg）和掩码焦点注意力（MFA）。这些设计共同促使查询聚焦于特定区域并识别不同实例。本方法命名为位置引导点云全景分割Transformer（P3Former），在SemanticKITTI和nuScenes基准上分别以3.4%和1.2%的PQ指标超越先前最先进方法。源代码及模型已开源至https://github.com/SmartBot-PJLab/P3Former。

相关内容

点云

关注 50

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

【CVPR2022】端到端实时矢量边缘提取（E2EC）

专知会员服务

16+阅读 · 2022年4月14日

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

专知会员服务

21+阅读 · 2022年3月18日

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

14+阅读 · 2022年3月12日

【CVPR2022】弱监督语义分割的类重新激活图

专知会员服务

17+阅读 · 2022年3月7日