As human-machine interaction continues to evolve, the capacity for environmental perception is becoming increasingly crucial. Integrating the two most common types of sensory data, images, and point clouds, can enhance detection accuracy. Currently, there is no existing model capable of detecting an object's position in both point clouds and images while also determining their corresponding relationship. This information is invaluable for human-machine interactions, offering new possibilities for their enhancement. In light of this, this paper introduces an end-to-end Consistency Object Detection (COD) algorithm framework that requires only a single forward inference to simultaneously obtain an object's position in both point clouds and images and establish their correlation. Furthermore, to assess the accuracy of the object correlation between point clouds and images, this paper proposes a new evaluation metric, Consistency Precision (CP). To verify the effectiveness of the proposed framework, an extensive set of experiments has been conducted on the KITTI and DAIR-V2X datasets. The study also explored how the proposed consistency detection method performs on images when the calibration parameters between images and point clouds are disturbed, compared to existing post-processing methods. The experimental results demonstrate that the proposed method exhibits excellent detection performance and robustness, achieving end-to-end consistency detection. The source code will be made publicly available at https://github.com/xifen523/COD.
翻译:随着人机交互的不断发展,环境感知能力变得日益关键。融合图像与点云这两种最常见的感知数据,能够提升检测精度。目前尚不存在能够在点云与图像中同时检测物体位置并确定其对应关系的模型。此类信息对人机交互具有重要价值,为其增强提供了新的可能性。鉴于此,本文提出一种端到端的一致性物体检测(COD)算法框架,仅需单次前向推理即可同时获取物体在点云与图像中的位置并建立其关联关系。此外,为评估点云与图像间物体关联的准确性,本文提出了一种新的评价指标——一致性精度(CP)。为验证所提框架的有效性,在KITTI与DAIR-V2X数据集上进行了大量实验。研究还探讨了当图像与点云间的标定参数受到干扰时,所提出的一致性检测方法在图像上的表现,并与现有后处理方法进行了对比。实验结果表明,所提方法展现出优异的检测性能与鲁棒性,实现了端到端的一致性检测。源代码将在https://github.com/xifen523/COD公开。