Depth estimation is a critical technology in autonomous driving, and multi-camera systems are often used to achieve a 360$^\circ$ perception. These 360$^\circ$ camera sets often have limited or low-quality overlap regions, making multi-view stereo methods infeasible for the entire image. Alternatively, monocular methods may not produce consistent cross-view predictions. To address these issues, we propose the Stereo Guided Depth Estimation (SGDE) method, which enhances depth estimation of the full image by explicitly utilizing multi-view stereo results on the overlap. We suggest building virtual pinhole cameras to resolve the distortion problem of fisheye cameras and unify the processing for the two types of 360$^\circ$ cameras. For handling the varying noise on camera poses caused by unstable movement, the approach employs a self-calibration method to obtain highly accurate relative poses of the adjacent cameras with minor overlap. These enable the use of robust stereo methods to obtain high-quality depth prior in the overlap region. This prior serves not only as an additional input but also as pseudo-labels that enhance the accuracy of depth estimation methods and improve cross-view prediction consistency. The effectiveness of SGDE is evaluated on one fisheye camera dataset, Synthetic Urban, and two pinhole camera datasets, DDAD and nuScenes. Our experiments demonstrate that SGDE is effective for both supervised and self-supervised depth estimation, and highlight the potential of our method for advancing downstream autonomous driving technologies, such as 3D object detection and occupancy prediction.
翻译:深度估计是自动驾驶中的关键技术,多摄像头系统常被用于实现360°感知。然而,这类360°摄像头组通常存在重叠区域有限或质量较低的问题,导致多视角立体方法无法适用于整幅图像。此外,单目方法可能无法产生一致的跨视角预测。为解决这些问题,我们提出立体引导深度估计(SGDE)方法,通过显式利用重叠区域的多视角立体结果来增强全幅图像的深度估计。我们建议构建虚拟针孔摄像头以解决鱼眼摄像头的畸变问题,并统一处理两类360°摄像头。为应对不稳定运动导致的摄像头位姿噪声变化,该方法采用自标定技术获取具有微小重叠的相邻摄像头的高精度相对位姿。这使鲁棒立体方法能够获得重叠区域的高质量深度先验。该先验不仅作为额外输入,还作为伪标签提升深度估计方法的精度并改善跨视角预测一致性。SGDE的有效性在一个鱼眼摄像头数据集Synthetic Urban及两个针孔摄像头数据集DDAD和nuScenes上得到验证。实验表明,SGDE对监督与自监督深度估计均有效,并凸显了该方法在推动自动驾驶下游技术(如3D目标检测与占据预测)发展的潜力。