Current multi-view 3D object detection methods often fail to detect objects in the overlap region properly, and the networks' understanding of the scene is often limited to that of a monocular detection network. Moreover, objects in the overlap region are often largely occluded or suffer from deformation due to camera distortion, causing a domain shift. To mitigate this issue, we propose using the following two main modules: (1) Stereo Disparity Estimation for Weak Depth Supervision and (2) Adversarial Overlap Region Discriminator. The former utilizes the traditional stereo disparity estimation method to obtain reliable disparity information from the overlap region. Given the disparity estimates as supervision, we propose regularizing the network to fully utilize the geometric potential of binocular images and improve the overall detection accuracy accordingly. Further, the latter module minimizes the representational gap between non-overlap and overlapping regions. We demonstrate the effectiveness of the proposed method with the nuScenes large-scale multi-view 3D object detection data. Our experiments show that our proposed method outperforms current state-of-the-art models, i.e., DETR3D and BEVDet.
翻译:当前的多视角三维目标检测方法往往无法正确检测重叠区域内的目标,且网络对场景的理解常局限于单目检测网络的能力。此外,重叠区域中的目标通常存在严重遮挡或因相机畸变导致形变,从而引发域偏移。为解决这一问题,我们提出采用以下两个核心模块:(1)用于弱深度监督的立体视差估计模块,以及(2)对抗性重叠区域判别器。前者利用传统立体视差估计方法获取重叠区域的可靠视差信息。通过将视差估计作为监督信号,我们提出对网络进行正则化,以充分挖掘双目图像的几何潜力,从而提升整体检测精度。后者则用于缩小非重叠区域与重叠区域之间的表征差异。我们在nuScenes大规模多视角三维目标检测数据集上验证了所提方法的有效性。实验结果表明,我们的方法优于当前最先进的模型,即DETR3D和BEVDet。