The success of the Segment Anything Model (SAM) demonstrates the significance of data-centric machine learning. However, due to the difficulties and high costs associated with annotating Remote Sensing (RS) images, a large amount of valuable RS data remains unlabeled, particularly at the pixel level. In this study, we leverage SAM and existing RS object detection datasets to develop an efficient pipeline for generating a large-scale RS segmentation dataset, dubbed SAMRS. SAMRS surpasses existing high-resolution RS segmentation datasets in size by several orders of magnitude, and provides object category, location, and instance information that can be used for semantic segmentation, instance segmentation, and object detection, either individually or in combination. We also provide a comprehensive analysis of SAMRS from various aspects. We hope it could facilitate research in RS segmentation, particularly in large model pre-training.
翻译:分割一切模型(Segment Anything Model,SAM)的成功证明了数据驱动机器学习的重要性。然而,由于遥感图像标注的困难和高成本,大量有价值的遥感数据仍未实现标签化,尤其是在像素级别。在本研究中,我们利用SAM和现有的遥感目标检测数据集,开发了一种高效流程,用于生成大规模遥感分割数据集,命名为SAMRS。SAMRS在规模上比现有高分辨率遥感分割数据集高出数个数量级,并提供目标类别、位置和实例信息,这些信息可单独或组合用于语义分割、实例分割和目标检测。我们还从多个方面对SAMRS进行了全面分析。希望该数据集能促进遥感分割研究,特别是在大型模型预训练方面。