The ocean is warming and acidifying, increasing the risk of mass mortality events for temperature-sensitive shellfish such as oysters. This motivates the development of long-term monitoring systems. However, human labor is costly and long-duration underwater work is highly hazardous, thus favoring robotic solutions as a safer and more efficient option. To enable underwater robots to make real-time, environment-aware decisions without human intervention, we must equip them with an intelligent "brain." This highlights the need for persistent,wide-area, and low-cost benthic monitoring. To this end, we present DREAM, a Vision Language Model (VLM)-guided autonomy framework for long-term underwater exploration and habitat monitoring. The results show that our framework is highly efficient in finding and exploring target objects (e.g., oysters, shipwrecks) without prior location information. In the oyster-monitoring task, our framework takes 31.5% less time than the previous baseline with the same amount of oysters. Compared to the vanilla VLM, it uses 23% fewer steps while covering 8.88% more oysters. In shipwreck scenes, our framework successfully explores and maps the wreck without collisions, requiring 27.5% fewer steps than the vanilla model and achieving 100% coverage, while the vanilla model achieves 60.23% average coverage in our shipwreck environments.
翻译:海洋正在变暖和酸化,这增加了温度敏感型贝类(如牡蛎)大规模死亡事件的风险。这推动了长期监测系统的开发。然而,人力成本高昂,且长时间水下作业极具危险性,因此机器人解决方案成为更安全、更高效的选择。为使水下机器人能够在无需人工干预的情况下做出实时、环境感知的决策,我们必须为其配备一个智能“大脑”。这凸显了对持久、广域、低成本海底监测的需求。为此,我们提出了DREAM,一个由视觉语言模型(VLM)引导的自主框架,用于长期水下探索和栖息地监测。结果表明,我们的框架在无需先验位置信息的情况下,寻找和探索目标物体(如牡蛎、沉船)方面具有高效率。在牡蛎监测任务中,在发现相同数量牡蛎的情况下,我们的框架比先前基线方法节省31.5%的时间。与原始VLM相比,它使用的步数减少了23%,同时覆盖的牡蛎数量增加了8.88%。在沉船场景中,我们的框架成功探索并绘制了沉船地图且未发生碰撞,所需步数比原始模型减少27.5%,并实现了100%的覆盖率,而原始模型在我们的沉船环境中平均覆盖率为60.23%。