Depth perception is essential for a robot's spatial and geometric understanding of its environment, with many tasks traditionally relying on hardware-based depth sensors like RGB-D or stereo cameras. However, these sensors face practical limitations, including issues with transparent and reflective objects, high costs, calibration complexity, spatial and energy constraints, and increased failure rates in compound systems. While monocular depth estimation methods offer a cost-effective and simpler alternative, their adoption in robotics is limited due to their output of relative rather than metric depth, which is crucial for robotics applications. In this paper, we propose a method that utilizes a single calibrated camera, enabling the robot to act as a ``measuring stick" to convert relative depth estimates into metric depth in real-time as tasks are performed. Our approach employs an LSTM-based metric depth regressor, trained online and refined through probabilistic filtering, to accurately restore the metric depth across the monocular depth map, particularly in areas proximal to the robot's motion. Experiments with real robots demonstrate that our method significantly outperforms current state-of-the-art monocular metric depth estimation techniques, achieving a 22.1% reduction in depth error and a 52% increase in success rate for a downstream task.
翻译:深度感知对于机器人理解环境的空间与几何结构至关重要,许多任务传统上依赖于基于硬件的深度传感器,如RGB-D或立体相机。然而,这些传感器存在实际限制,包括对透明和反射物体的处理问题、高成本、标定复杂性、空间与能量限制,以及复合系统中更高的故障率。虽然单目深度估计方法提供了一种经济高效且更简单的替代方案,但由于其输出为相对深度而非机器人应用所必需的度量深度,其在机器人领域的应用受到限制。本文提出一种利用单个标定相机的方法,使机器人能够充当“测量尺”,在执行任务时实时将相对深度估计转换为度量深度。我们的方法采用基于LSTM的度量深度回归器,通过在线训练和概率滤波优化,准确恢复单目深度图中的度量深度,特别是在机器人运动邻近区域。真实机器人实验表明,我们的方法显著优于当前最先进的单目度量深度估计技术,在深度误差上降低了22.1%,并在下游任务中实现了52%的成功率提升。