Manufacturing requires reliable object detection methods for precise picking and handling of diverse types of manufacturing parts and components. Traditional object detection methods utilize either only 2D images from cameras or 3D data from lidars or similar 3D sensors. However, each of these sensors have weaknesses and limitations. Cameras do not have depth perception and 3D sensors typically do not carry color information. These weaknesses can undermine the reliability and robustness of industrial manufacturing systems. To address these challenges, this work proposes a multi-sensor system combining an red-green-blue (RGB) camera and a 3D point cloud sensor. The two sensors are calibrated for precise alignment of the multimodal data captured from the two hardware devices. A novel multimodal object detection method is developed to process both RGB and depth data. This object detector is based on the Faster R-CNN baseline that was originally designed to process only camera images. The results show that the multimodal model significantly outperforms the depth-only and RGB-only baselines on established object detection metrics. More specifically, the multimodal model improves mAP by 13% and raises Mean Precision by 11.8% in comparison to the RGB-only baseline. Compared to the depth-only baseline, it improves mAP by 78% and raises Mean Precision by 57%. Hence, this method facilitates more reliable and robust object detection in service to smart manufacturing applications.
翻译:制造业需要可靠的目标检测方法,以实现对各类制造零件与部件的精确抓取与操作。传统目标检测方法仅使用来自摄像头的二维图像或来自激光雷达等三维传感器的三维数据。然而,这些传感器各自存在缺陷与局限:摄像头缺乏深度感知能力,而三维传感器通常不携带色彩信息。这些缺陷可能损害工业制造系统的可靠性与鲁棒性。为应对这些挑战,本研究提出一种融合红绿蓝(RGB)摄像头与三维点云传感器的多传感器系统。通过对两个硬件设备采集的多模态数据进行精确配准校准,开发了一种新型多模态目标检测方法以同步处理RGB数据与深度数据。该检测器基于最初仅设计用于处理摄像头图像的Faster R-CNN基线模型。实验结果表明,在既定目标检测指标上,多模态模型显著优于仅使用深度数据或仅使用RGB数据的基线模型。具体而言,相较于纯RGB基线,多模态模型将mAP提升13%,平均精度提高11.8%;相较于纯深度基线,其mAP提升78%,平均精度提高57%。因此,该方法能为智能制造应用提供更可靠、更鲁棒的目标检测能力。