Towards Consistent Object Detection via LiDAR-Camera Synergy

As human-machine interaction continues to evolve, the capacity for environmental perception is becoming increasingly crucial. Integrating the two most common types of sensory data, images, and point clouds, can enhance detection accuracy. However, currently, no model exists that can simultaneously detect an object's position in both point clouds and images and ascertain their corresponding relationship. This information is invaluable for human-machine interactions, offering new possibilities for their enhancement. In light of this, this paper introduces an end-to-end Consistency Object Detection (COD) algorithm framework that requires only a single forward inference to simultaneously obtain an object's position in both point clouds and images and establish their correlation. Furthermore, to assess the accuracy of the object correlation between point clouds and images, this paper proposes a new evaluation metric, Consistency Precision (CP). To verify the effectiveness of the proposed framework, an extensive set of experiments has been conducted on the KITTI and DAIR-V2X datasets. The study also explored how the proposed consistency detection method performs on images when the calibration parameters between images and point clouds are disturbed, compared to existing post-processing methods. The experimental results demonstrate that the proposed method exhibits excellent detection performance and robustness, achieving end-to-end consistency detection. The source code will be made publicly available at https://github.com/xifen523/COD.

翻译：随着人机交互技术的持续演进，环境感知能力变得日益关键。整合图像与点云这两种最常见的感知数据类型，可以提升检测精度。然而，目前尚不存在能够同时在点云和图像中检测目标位置并确定其对应关系的模型。这一信息对于人机交互至关重要，为其优化提供了新的可能。基于此，本文提出了一种端到端的一致性目标检测（COD）算法框架，仅需单次前向推理即可同步获取目标在点云与图像中的位置，并建立二者的关联。此外，为评估点云与图像中目标关联的准确性，本文提出了一种新的评价指标——一致性精度（CP）。为验证所提框架的有效性，在KITTI和DAIR-V2X数据集上开展了大量实验。研究还探讨了当图像与点云之间的标定参数受扰动时，所提一致性检测方法相较于现有后处理方法的图像端性能表现。实验结果表明，该方法展现出优异的检测性能与鲁棒性，实现了端到端的一致性检测。源代码将公开发布于https://github.com/xifen523/COD。

相关内容

点云

关注 50

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日