Underwater caves are challenging environments that are crucial for water resource management, and for our understanding of hydro-geology and history. Mapping underwater caves is a time-consuming, labor-intensive, and hazardous operation. For autonomous cave mapping by underwater robots, the major challenge lies in vision-based estimation in the complete absence of ambient light, which results in constantly moving shadows due to the motion of the camera-light setup. Thus, detecting and following the caveline as navigation guidance is paramount for robots in autonomous cave mapping missions. In this paper, we present a computationally light caveline detection model based on a novel Vision Transformer (ViT)-based learning pipeline. We address the problem of scarce annotated training data by a weakly supervised formulation where the learning is reinforced through a series of noisy predictions from intermediate sub-optimal models. We validate the utility and effectiveness of such weak supervision for caveline detection and tracking in three different cave locations: USA, Mexico, and Spain. Experimental results demonstrate that our proposed model, CL-ViT, balances the robustness-efficiency trade-off, ensuring good generalization performance while offering 10+ FPS on single-board (Jetson TX2) devices.
翻译:水下洞穴是对水资源管理、理解水文地质和历史至关重要的挑战性环境。绘制水下洞穴地图是一项耗时、费力且危险的作业。对于水下机器人自主绘制洞穴地图而言,主要挑战在于完全无环境光照条件下的视觉估计,这会导致因相机-灯光装置运动而产生不断移动的阴影。因此,检测并跟踪洞穴线作为导航引导,对于机器人在自主洞穴测绘任务中至关重要。本文提出了一种基于新型视觉Transformer(ViT)学习管道的轻量级洞穴线检测模型。我们通过弱监督形式解决了标注训练数据稀缺的问题,其中学习过程通过来自中间次优模型的一系列噪声预测得到强化。我们在美国、墨西哥和西班牙三个不同洞穴地点验证了这种弱监督方法在洞穴线检测和跟踪中的实用性和有效性。实验结果表明,我们提出的模型CL-ViT平衡了鲁棒性与效率的权衡,在确保良好泛化性能的同时,在单板设备(Jetson TX2)上实现了10+ FPS的处理速度。