In this paper, we present a fast monocular depth estimation method for enabling 3D perception capabilities of low-cost underwater robots. We formulate a novel end-to-end deep visual learning pipeline named UDepth, which incorporates domain knowledge of image formation characteristics of natural underwater scenes. First, we adapt a new input space from raw RGB image space by exploiting underwater light attenuation prior, and then devise a least-squared formulation for coarse pixel-wise depth prediction. Subsequently, we extend this into a domain projection loss that guides the end-to-end learning of UDepth on over 9K RGB-D training samples. UDepth is designed with a computationally light MobileNetV2 backbone and a Transformer-based optimizer for ensuring fast inference rates on embedded systems. By domain-aware design choices and through comprehensive experimental analyses, we demonstrate that it is possible to achieve state-of-the-art depth estimation performance while ensuring a small computational footprint. Specifically, with 70%-80% less network parameters than existing benchmarks, UDepth achieves comparable and often better depth estimation performance. While the full model offers over 66 FPS (13 FPS) inference rates on a single GPU (CPU core), our domain projection for coarse depth prediction runs at 51.5 FPS rates on single-board NVIDIA Jetson TX2s. The inference pipelines are available at https://github.com/uf-robopi/UDepth.
翻译:本文提出了一种快速单目深度估计方法,旨在赋予低成本水下机器人三维感知能力。我们构建了一个名为UDepth的新型端到端深度视觉学习管道,该管道融合了自然水下场景图像形成特性的领域知识。首先,通过利用水下光衰减先验信息,我们从原始RGB图像空间适配了一个新的输入空间,并设计了一种基于最小二乘法的粗粒度像素级深度预测方案。随后,我们将该方法扩展为领域投影损失函数,以指导UDepth在超过9K个RGB-D训练样本上的端到端学习。UDepth采用计算轻量化的MobileNetV2骨干网络和基于Transformer的优化器设计,确保在嵌入式系统上实现快速推理。通过领域感知设计选择及全面的实验分析,我们证明了在保持较小计算开销的同时,可实现最先进的深度估计性能。具体而言,与现有基准相比,UDepth的网络参数减少了70%-80%,却能达到相当甚至更优的深度估计精度。在单GPU(CPU核心)上,完整模型推理速率超过66 FPS(13 FPS),而用于粗粒度深度预测的领域投影模块在英伟达Jetson TX2单板机上可达51.5 FPS。推理管道代码已开源至https://github.com/uf-robopi/UDepth。