As human-machine interaction continues to evolve, the capacity for environmental perception is becoming increasingly crucial. Integrating the two most common types of sensory data, images, and point clouds, can enhance detection accuracy. However, currently, no model exists that can simultaneously detect an object's position in both point clouds and images and ascertain their corresponding relationship. This information is invaluable for human-machine interactions, offering new possibilities for their enhancement. In light of this, this paper introduces an end-to-end Consistency Object Detection (COD) algorithm framework that requires only a single forward inference to simultaneously obtain an object's position in both point clouds and images and establish their correlation. Furthermore, to assess the accuracy of the object correlation between point clouds and images, this paper proposes a new evaluation metric, Consistency Precision (CP). To verify the effectiveness of the proposed framework, an extensive set of experiments has been conducted on the KITTI and DAIR-V2X datasets. The study also explored how the proposed consistency detection method performs on images when the calibration parameters between images and point clouds are disturbed, compared to existing post-processing methods. The experimental results demonstrate that the proposed method exhibits excellent detection performance and robustness, achieving end-to-end consistency detection. The source code will be made publicly available at https://github.com/xifen523/COD.
翻译:随着人机交互技术的持续演进,环境感知能力变得日益关键。整合图像与点云这两种最常见的感知数据类型,可以提升检测精度。然而,目前尚不存在能够同时在点云和图像中检测目标位置并确定其对应关系的模型。这一信息对于人机交互至关重要,为其优化提供了新的可能。基于此,本文提出了一种端到端的一致性目标检测(COD)算法框架,仅需单次前向推理即可同步获取目标在点云与图像中的位置,并建立二者的关联。此外,为评估点云与图像中目标关联的准确性,本文提出了一种新的评价指标——一致性精度(CP)。为验证所提框架的有效性,在KITTI和DAIR-V2X数据集上开展了大量实验。研究还探讨了当图像与点云之间的标定参数受扰动时,所提一致性检测方法相较于现有后处理方法的图像端性能表现。实验结果表明,该方法展现出优异的检测性能与鲁棒性,实现了端到端的一致性检测。源代码将公开发布于https://github.com/xifen523/COD。