Due to the lack of depth information of images and poor detection accuracy in monocular 3D object detection, we proposed the instance depth for multi-scale monocular 3D object detection method. Firstly, to enhance the model's processing ability for different scale targets, a multi-scale perception module based on dilated convolution is designed, and the depth features containing multi-scale information are re-refined from both spatial and channel directions considering the inconsistency between feature maps of different scales. Firstly, we designed a multi-scale perception module based on dilated convolution to enhance the model's processing ability for different scale targets. The depth features containing multi-scale information are re-refined from spatial and channel directions considering the inconsistency between feature maps of different scales. Secondly, so as to make the model obtain better 3D perception, this paper proposed to use the instance depth information as an auxiliary learning task to enhance the spatial depth feature of the 3D target and use the sparse instance depth to supervise the auxiliary task. Finally, by verifying the proposed algorithm on the KITTI test set and evaluation set, the experimental results show that compared with the baseline method, the proposed method improves by 5.27\% in AP40 in the car category, effectively improving the detection performance of the monocular 3D object detection algorithm.
翻译:由于图像深度信息缺失及单目三维目标检测精度较低的问题,本文提出了一种基于实例深度的多尺度单目三维目标检测方法。首先,为增强模型对不同尺度目标处理能力,设计了基于空洞卷积的多尺度感知模块,并考虑不同尺度特征图间的不一致性,从空间和通道两个维度对包含多尺度信息的深度特征进行二次精炼。其次,为使模型获得更优的三维感知能力,本文提出将实例深度信息作为辅助学习任务,以增强三维目标的空间深度特征,并利用稀疏实例深度对辅助任务进行监督。最后,通过在KITTI测试集与评估集上验证所提算法,实验结果表明:与基线方法相比,该方法在汽车类别的AP40指标上提升了5.27%,有效改善了单目三维目标检测算法的检测性能。