Autonomous field robots operating in unstructured environments require robust perception to ensure safe and reliable operations. Recent advances in monocular depth estimation have demonstrated the potential of low-cost cameras as depth sensors; however, their adoption in field robotics remains limited due to the absence of reliable scale cues, ambiguous or low-texture conditions, and the scarcity of large-scale datasets. To address these challenges, we propose a depth completion model that trains on synthetic data and uses extremely sparse measurements from depth sensors to predict dense metric depth in unseen field robotics environments. A synthetic dataset generation pipeline tailored to field robotics enables the creation of multiple realistic datasets for training purposes. This dataset generation approach utilizes textured 3D meshes from Structure from Motion and photorealistic rendering with novel viewpoint synthesis to simulate diverse field robotics scenarios. Our approach achieves an end-to-end latency of 53 ms per frame on a Nvidia Jetson AGX Orin, enabling real-time deployment on embedded platforms. Extensive evaluation demonstrates competitive performance across diverse real-world field robotics scenarios.
翻译:在非结构化环境中运行的自主野外机器人需要鲁棒的感知能力以确保安全可靠的操作。单目深度估计的最新进展已证明低成本相机作为深度传感器的潜力;然而,由于缺乏可靠的尺度线索、存在模糊或低纹理条件以及大规模数据集的稀缺,其在野外机器人领域的应用仍然有限。为应对这些挑战,我们提出一种深度补全模型,该模型在合成数据上训练,并利用深度传感器的极稀疏测量值来预测未知野外机器人环境中的稠密度量深度。一个专为野外机器人定制的合成数据集生成流程能够为训练目的创建多个逼真的数据集。该数据集生成方法利用运动恢复结构获取的带纹理三维网格,结合具有新颖视点合成的照片级真实感渲染技术,以模拟多样化的野外机器人场景。我们的方法在Nvidia Jetson AGX Orin平台上实现了每帧53毫秒的端到端延迟,从而能够在嵌入式平台上进行实时部署。广泛的评估表明,该方法在多样化的真实世界野外机器人场景中均展现出具有竞争力的性能。