Machine Learning for Shipwreck Segmentation from Side Scan Sonar Imagery: Dataset and Benchmark

Open-source benchmark datasets have been a critical component for advancing machine learning for robot perception in terrestrial applications. Benchmark datasets enable the widespread development of state-of-the-art machine learning methods, which require large datasets for training, validation, and thorough comparison to competing approaches. Underwater environments impose several operational challenges that hinder efforts to collect large benchmark datasets for marine robot perception. Furthermore, a low abundance of targets of interest relative to the size of the search space leads to increased time and cost required to collect useful datasets for a specific task. As a result, there is limited availability of labeled benchmark datasets for underwater applications. We present the AI4Shipwrecks dataset, which consists of 24 distinct shipwreck sites totaling 286 high-resolution labeled side scan sonar images to advance the state-of-the-art in autonomous sonar image understanding. We leverage the unique abundance of targets in Thunder Bay National Marine Sanctuary in Lake Huron, MI, to collect and compile a sonar imagery benchmark dataset through surveys with an autonomous underwater vehicle (AUV). We consulted with expert marine archaeologists for the labeling of robotically gathered data. We then leverage this dataset to perform benchmark experiments for comparison of state-of-the-art supervised segmentation methods, and we present insights on opportunities and open challenges for the field. The dataset and benchmarking tools will be released as an open-source benchmark dataset to spur innovation in machine learning for Great Lakes and ocean exploration. The dataset and accompanying software are available at https://umfieldrobotics.github.io/ai4shipwrecks/.

翻译：开源基准数据集对于推动陆地应用中机器人感知的机器学习发展至关重要。基准数据集能够促进先进机器学习方法的广泛开发，而这些方法需要大量数据集进行训练、验证及与竞争方法的全面比较。水下环境带来的若干操作挑战阻碍了海洋机器人感知领域大规模基准数据集的收集工作。此外，目标物相较于搜索空间而言丰度较低，导致收集特定任务所需有用数据集的时间和成本增加。因此，水下应用中标注基准数据集的可用性十分有限。我们提出了AI4Shipwrecks数据集，该数据集包含24个不同的沉船遗址，共计286张高分辨率标注侧扫声纳图像，旨在推动自主声纳图像理解领域的最新技术发展。我们利用密歇根州休伦湖桑德贝国家海洋保护区中独特丰富的目标物，通过自主水下航行器（AUV）勘测收集并编制了声纳图像基准数据集。在机器人收集数据的标注过程中，我们咨询了海洋考古专家。随后，我们利用该数据集进行基准实验，以比较最先进的监督式分割方法，并提出了该领域的机遇与开放性挑战的见解。该数据集及基准测试工具将以开源基准数据集形式发布，以激发五大湖及海洋探索中机器学习的创新。数据集及配套软件可在https://umfieldrobotics.github.io/ai4shipwrecks/获取。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日