Label noise in the sense of incorrect labels is present in many real-world data sets and is known to severely limit the generalizability of deep learning models. In the field of remote sensing, however, automated treatment of label noise in data sets has received little attention to date. In particular, there is a lack of systematic analysis of the performance of data-centric methods that not only cope with label noise but also explicitly identify and isolate noisy labels. In this paper, we examine three such methods and evaluate their behavior under different label noise assumptions. To do this, we inject different types of label noise with noise levels ranging from 10 to 70% into two benchmark data sets, followed by an analysis of how well the selected methods filter the label noise and how this affects task performances. With our analyses, we clearly prove the value of data-centric methods for both parts - label noise identification and task performance improvements. Our analyses provide insights into which method is the best choice depending on the setting and objective. Finally, we show in which areas there is still a need for research in the transfer of data-centric label noise methods to remote sensing data. As such, our work is a step forward in bridging the methodological establishment of data-centric label noise methods and their usage in practical settings in the remote sensing domain.
翻译:标签噪声(即错误标签)广泛存在于众多现实世界数据集中,并已知会严重限制深度学习模型的泛化能力。然而,在遥感领域,对数据集中标签噪声的自动化处理迄今鲜少受到关注。特别是,目前缺乏对数据中心化方法的系统性性能分析——这些方法不仅能处理标签噪声,还能明确识别并隔离噪声标签。本文研究了三种此类方法,并评估了它们在不同标签噪声假设下的表现。为此,我们在两个基准数据集中注入了噪声水平从10%到70%不等的多种类型标签噪声,随后分析了所选方法过滤标签噪声的效果及其对任务性能的影响。通过分析,我们明确证实了数据中心化方法在标签噪声识别和任务性能提升两方面的价值。我们的研究揭示了在不同场景和目标下应如何选择最佳方法。最后,我们指出了将数据中心化标签噪声方法迁移至遥感数据时仍需进一步研究的领域。因此,本研究在推动数据中心化标签噪声方法的方法论建立与其在遥感领域实际应用之间的融合方面迈出了重要一步。