In this work, we address the problem of real-time dense depth estimation from monocular images for mobile underwater vehicles. We formulate a deep learning model that fuses sparse depth measurements from triangulated features to improve the depth predictions and solve the problem of scale ambiguity. To allow prior inputs of arbitrary sparsity, we apply a dense parameterization method. Our model extends recent state-of-the-art approaches to monocular image based depth estimation, using an efficient encoder-decoder backbone and modern lightweight transformer optimization stage to encode global context. The network is trained in a supervised fashion on the forward-looking underwater dataset, FLSea. Evaluation results on this dataset demonstrate significant improvement in depth prediction accuracy by the fusion of the sparse feature priors. In addition, without any retraining, our method achieves similar depth prediction accuracy on a downward looking dataset we collected with a diver operated camera rig, conducting a survey of a coral reef. The method achieves real-time performance, running at 160 FPS on a laptop GPU and 7 FPS on a single CPU core and is suitable for direct deployment on embedded systems. The implementation of this work is made publicly available at https://github.com/ebnerluca/uw_depth.
翻译:本文研究了移动式水下航行器从单目图像进行实时稠密深度估计的问题。我们提出了一个深度学习模型,融合来自三角化特征点的稀疏深度测量值,以改进深度预测并解决尺度模糊性问题。为了允许任意稀疏度的先验输入,我们采用了一种稠密参数化方法。该模型扩展了当前最先进的基于单目图像的深度估计方法,使用高效的编码器-解码器主干网络和现代化的轻量级Transformer优化阶段来编码全局上下文。网络以前视水下数据集FLSea为训练集进行监督学习。在该数据集上的评估结果表明,融合稀疏特征先验显著提升了深度预测精度。此外,无需任何重新训练,我们使用潜水员操控相机平台采集的下视珊瑚礁调查数据集,该方法同样达到了相近的深度预测精度。本方法具有实时性能,在笔记本电脑GPU上达到160 FPS,在单CPU核心上达到7 FPS,可直接部署于嵌入式系统。本工作的实现代码已在https://github.com/ebnerluca/uw_depth 开源。