This paper presents refined BigEarthNet (reBEN) that is a large-scale, multi-modal remote sensing dataset constructed to support deep learning (DL) studies for remote sensing image analysis. The reBEN dataset consists of 549,488 pairs of Sentinel-1 and Sentinel-2 image patches. To construct reBEN, we initially consider the Sentinel-1 and Sentinel-2 tiles used to construct the BigEarthNet dataset and then divide them into patches of size 1200 m x 1200 m. We apply atmospheric correction to the Sentinel-2 patches using the latest version of the sen2cor tool, resulting in higher-quality patches compared to those present in BigEarthNet. Each patch is then associated with a pixel-level reference map and scene-level multi-labels. This makes reBEN suitable for pixel- and scene-based learning tasks. The labels are derived from the most recent CORINE Land Cover (CLC) map of 2018 by utilizing the 19-class nomenclature as in BigEarthNet. The use of the most recent CLC map results in overcoming the label noise present in BigEarthNet. Furthermore, we introduce a new geographical-based split assignment algorithm that significantly reduces the spatial correlation among the train, validation, and test sets with respect to those present in BigEarthNet. This increases the reliability of the evaluation of DL models. To minimize the DL model training time, we introduce software tools that convert the reBEN dataset into a DL-optimized data format. In our experiments, we show the potential of reBEN for multi-modal multi-label image classification problems by considering several state-of-the-art DL models. The pre-trained model weights, associated code, and complete dataset are available at https://bigearth.net.
翻译:本文提出精细化BigEarthNet(reBEN)数据集,这是一个为支持遥感影像分析的深度学习研究而构建的大规模多模态遥感数据集。reBEN数据集包含549,488对Sentinel-1与Sentinel-2影像图块。在构建reBEN时,我们首先采用构建原始BigEarthNet数据集时使用的Sentinel-1与Sentinel-2影像瓦片,将其分割为1200米×1200米尺寸的图块。我们使用最新版本的sen2cor工具对Sentinel-2图块进行大气校正,从而获得比原始BigEarthNet更高质量的图块。每个图块均关联像素级参考图与场景级多标签,这使得reBEN同时适用于基于像素和基于场景的学习任务。标签来源于2018年最新版CORINE土地覆盖(CLC)地图,采用与BigEarthNet相同的19类分类体系。使用最新CLC地图有效克服了原始BigEarthNet中存在的标签噪声问题。此外,我们提出一种基于地理位置的数据划分算法,相较于原始BigEarthNet的划分方式,该算法显著降低了训练集、验证集与测试集之间的空间相关性,从而提升了深度学习模型评估的可靠性。为最小化深度学习模型训练时间,我们开发了将reBEN数据集转换为深度学习优化数据格式的软件工具。在实验中,我们通过采用多种前沿深度学习模型,展示了reBEN在多模态多标签影像分类问题中的应用潜力。预训练模型权重、相关代码及完整数据集可通过https://bigearth.net获取。