Towards Consistent Object Detection via LiDAR-Camera Synergy

As human-machine interaction continues to evolve, the capacity for environmental perception is becoming increasingly crucial. Integrating the two most common types of sensory data, images, and point clouds, can enhance detection accuracy. Currently, there is no existing model capable of detecting an object's position in both point clouds and images while also determining their corresponding relationship. This information is invaluable for human-machine interactions, offering new possibilities for their enhancement. In light of this, this paper introduces an end-to-end Consistency Object Detection (COD) algorithm framework that requires only a single forward inference to simultaneously obtain an object's position in both point clouds and images and establish their correlation. Furthermore, to assess the accuracy of the object correlation between point clouds and images, this paper proposes a new evaluation metric, Consistency Precision (CP). To verify the effectiveness of the proposed framework, an extensive set of experiments has been conducted on the KITTI and DAIR-V2X datasets. The study also explored how the proposed consistency detection method performs on images when the calibration parameters between images and point clouds are disturbed, compared to existing post-processing methods. The experimental results demonstrate that the proposed method exhibits excellent detection performance and robustness, achieving end-to-end consistency detection. The source code will be made publicly available at https://github.com/xifen523/COD.

翻译：随着人机交互的不断发展，环境感知能力变得日益关键。融合图像与点云这两种最常见的感知数据，能够提升检测精度。目前尚不存在能够在点云与图像中同时检测物体位置并确定其对应关系的模型。此类信息对人机交互具有重要价值，为其增强提供了新的可能性。鉴于此，本文提出一种端到端的一致性物体检测（COD）算法框架，仅需单次前向推理即可同时获取物体在点云与图像中的位置并建立其关联关系。此外，为评估点云与图像间物体关联的准确性，本文提出了一种新的评价指标——一致性精度（CP）。为验证所提框架的有效性，在KITTI与DAIR-V2X数据集上进行了大量实验。研究还探讨了当图像与点云间的标定参数受到干扰时，所提出的一致性检测方法在图像上的表现，并与现有后处理方法进行了对比。实验结果表明，所提方法展现出优异的检测性能与鲁棒性，实现了端到端的一致性检测。源代码将在https://github.com/xifen523/COD公开。

相关内容

点云

关注 50

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日