GraphAlign: Enhancing Accurate Feature Alignment by Graph matching for Multi-Modal 3D Object Detection

LiDAR and cameras are complementary sensors for 3D object detection in autonomous driving. However, it is challenging to explore the unnatural interaction between point clouds and images, and the critical factor is how to conduct feature alignment of heterogeneous modalities. Currently, many methods achieve feature alignment by projection calibration only, without considering the problem of coordinate conversion accuracy errors between sensors, leading to sub-optimal performance. In this paper, we present GraphAlign, a more accurate feature alignment strategy for 3D object detection by graph matching. Specifically, we fuse image features from a semantic segmentation encoder in the image branch and point cloud features from a 3D Sparse CNN in the LiDAR branch. To save computation, we construct the nearest neighbor relationship by calculating Euclidean distance within the subspaces that are divided into the point cloud features. Through the projection calibration between the image and point cloud, we project the nearest neighbors of point cloud features onto the image features. Then by matching the nearest neighbors with a single point cloud to multiple images, we search for a more appropriate feature alignment. In addition, we provide a self-attention module to enhance the weights of significant relations to fine-tune the feature alignment between heterogeneous modalities. Extensive experiments on nuScenes benchmark demonstrate the effectiveness and efficiency of our GraphAlign.

翻译：激光雷达和摄像头是自动驾驶中用于3D目标检测的互补传感器。然而，探索点云与图像之间的非自然交互具有挑战性，其关键因素在于如何实现异质模态的特征对齐。目前，许多方法仅通过投影标定实现特征对齐，未考虑传感器间坐标转换精度误差的问题，导致性能欠佳。本文提出GraphAlign——一种基于图匹配的更高精度3D目标检测特征对齐策略。具体而言，我们在图像分支中融合来自语义分割编码器的图像特征，在激光雷达分支中融合来自3D稀疏CNN的点云特征。为节省计算量，我们通过计算点云特征划分的子空间内欧氏距离来构建最近邻关系。通过图像与点云之间的投影标定，将点云特征的最近邻投影至图像特征上。随后通过将单点云的最近邻与多张图像进行匹配，搜索更合适的特征对齐方式。此外，我们引入自注意力模块增强重要关系的权重，以微调异质模态间的特征对齐。在nuScenes基准上的大量实验证明了GraphAlign的有效性与高效性。

相关内容

点云

关注 50

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日