2D-3D Interlaced Transformer for Point Cloud Segmentation with Scene-Level Supervision

We present a Multimodal Interlaced Transformer (MIT) that jointly considers 2D and 3D data for weakly supervised point cloud segmentation. Research studies have shown that 2D and 3D features are complementary for point cloud segmentation. However, existing methods require extra 2D annotations to achieve 2D-3D information fusion. Considering the high annotation cost of point clouds, effective 2D and 3D feature fusion based on weakly supervised learning is in great demand. To this end, we propose a transformer model with two encoders and one decoder for weakly supervised point cloud segmentation using only scene-level class tags. Specifically, the two encoders compute the self-attended features for 3D point clouds and 2D multi-view images, respectively. The decoder implements interlaced 2D-3D cross-attention and carries out implicit 2D and 3D feature fusion. We alternately switch the roles of queries and key-value pairs in the decoder layers. It turns out that the 2D and 3D features are iteratively enriched by each other. Experiments show that it performs favorably against existing weakly supervised point cloud segmentation methods by a large margin on the S3DIS and ScanNet benchmarks. The project page will be available at https://jimmy15923.github.io/mit_web/.

翻译：我们提出了一种多模态交织Transformer（MIT），该模型联合考虑二维和三维数据以实现弱监督点云分割。研究表明，二维和三维特征在点云分割中具有互补性。然而，现有方法需要额外的二维标注来实现二维-三维信息融合。考虑到点云的高标注成本，基于弱监督学习的有效二维与三维特征融合需求迫切。为此，我们提出了一种包含两个编码器和一个解码器的Transformer模型，仅利用场景级类别标签实现弱监督点云分割。具体而言，两个编码器分别计算三维点云和二维多视角图像的自注意力特征，解码器则实现交织的二维-三维交叉注意力机制，并进行隐式的二维与三维特征融合。我们在解码器层中交替切换查询与键值对的角色，使得二维和三维特征能够相互迭代增强。实验表明，该方法在S3DIS和ScanNet基准测试中显著优于现有弱监督点云分割方法。项目页面将发布在https://jimmy15923.github.io/mit_web/。

相关内容

点云

关注 50

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日