X-3D: Explicit 3D Structure Modeling for Point Cloud Recognition

Numerous prior studies predominantly emphasize constructing relation vectors for individual neighborhood points and generating dynamic kernels for each vector and embedding these into high-dimensional spaces to capture implicit local structures. However, we contend that such implicit high-dimensional structure modeling approch inadequately represents the local geometric structure of point clouds due to the absence of explicit structural information. Hence, we introduce X-3D, an explicit 3D structure modeling approach. X-3D functions by capturing the explicit local structural information within the input 3D space and employing it to produce dynamic kernels with shared weights for all neighborhood points within the current local region. This modeling approach introduces effective geometric prior and significantly diminishes the disparity between the local structure of the embedding space and the original input point cloud, thereby improving the extraction of local features. Experiments show that our method can be used on a variety of methods and achieves state-of-the-art performance on segmentation, classification, detection tasks with lower extra computational cost, such as \textbf{90.7\%} on ScanObjectNN for classification, \textbf{79.2\%} on S3DIS 6 fold and \textbf{74.3\%} on S3DIS Area 5 for segmentation, \textbf{76.3\%} on ScanNetV2 for segmentation and \textbf{64.5\%} mAP , \textbf{46.9\%} mAP on SUN RGB-D and \textbf{69.0\%} mAP , \textbf{51.1\%} mAP on ScanNetV2 . Our code is available at \href{https://github.com/sunshuofeng/X-3D}{https://github.com/sunshuofeng/X-3D}.

翻译：诸多先前研究主要侧重于为每个邻域点构建关系向量，并为每个向量生成动态内核，将其嵌入高维空间以捕捉隐式局部结构。然而，我们认为这种隐式高维结构建模方法因缺乏显式结构信息，无法充分表征点云的局部几何结构。为此，我们提出X-3D，一种显式三维结构建模方法。X-3D通过捕获输入三维空间内的显式局部结构信息，并利用该信息为当前局部区域内所有邻域点生成具有共享权重的动态内核。该建模方法引入了有效的几何先验，显著缩小了嵌入空间局部结构与原始输入点云之间的差异，从而提升了局部特征的提取效果。实验表明，我们的方法可适用于多种不同方法，并在分割、分类和检测任务中以较低额外计算成本取得了最先进性能——例如在ScanObjectNN分类任务上达到\textbf{90.7\%}，在S3DIS 6折交叉验证和S3DIS Area 5分割任务上分别达到\textbf{79.2\%}和\textbf{74.3\%}，在ScanNetV2分割任务上达到\textbf{76.3\%}，以及在SUN RGB-D上达到\textbf{64.5\%} mAP和\textbf{46.9\%} mAP，在ScanNetV2上达到\textbf{69.0\%} mAP和\textbf{51.1\%} mAP。我们的代码已开源在\href{https://github.com/sunshuofeng/X-3D}{https://github.com/sunshuofeng/X-3D}。

相关内容

点云

关注 50

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日