3D object detection plays a crucial role in numerous intelligent vision systems. Detection in the open world inevitably encounters various adverse scenes, such as dense fog, heavy rain, and low light conditions. Although existing efforts primarily focus on diversifying network architecture or training schemes, resulting in significant progress in 3D object detection, most of these learnable modules fail in adverse scenes, thereby hindering detection performance. To address this issue, this paper proposes a monocular 3D detection model designed to perceive twin depth in adverse scenes, termed MonoTDP, which effectively mitigates the degradation of detection performance in various harsh environments. Specifically, we first introduce an adaptive learning strategy to aid the model in handling uncontrollable weather conditions, significantly resisting degradation caused by various degrading factors. Then, to address the depth/content loss in adverse regions, we propose a novel twin depth perception module that simultaneously estimates scene and object depth, enabling the integration of scene-level features and object-level features. Additionally, we assemble a new adverse 3D object detection dataset encompassing a wide range of challenging scenes, including rainy, foggy, and low light weather conditions, with each type of scene containing 7,481 images. Experimental results demonstrate that our proposed method outperforms current state-of-the-art approaches by an average of 3.12% in terms of AP_R40 for car category across various adverse environments.
翻译:三维目标检测在众多智能视觉系统中扮演关键角色。开放世界中的检测不可避免地会遇到各种恶劣场景,例如浓雾、暴雨和低光照条件。尽管现有工作主要致力于多样化网络架构或训练方案,并在三维目标检测方面取得了显著进展,但大多数可学习模块在恶劣场景中失效,从而阻碍了检测性能。为解决这一问题,本文提出一种单目三维检测模型——MonoTDP,该模型旨在恶劣场景中感知孪生深度,有效缓解了多种恶劣环境下检测性能的退化。具体而言,我们首先引入自适应学习策略,以帮助模型应对不可控的天气条件,显著抵抗由各种退化因素引起的性能下降。其次,为处理恶劣区域中的深度/内容损失,我们提出一种新颖的孪生深度感知模块,可同时估计场景深度和物体深度,实现场景级特征与物体级特征的融合。此外,我们构建了一个新的恶劣三维目标检测数据集,涵盖多种挑战性场景,包括雨天、雾天和低光照天气条件,每种场景包含7481张图像。实验结果表明,在各类恶劣环境中,我们提出的方法在汽车类别的AP_R40指标上平均优于当前最先进方法3.12%。