Self-supervised learning (SSL) methods have become a dominant paradigm for creating general purpose models whose capabilities can be transferred to downstream supervised learning tasks. However, most such methods rely on vast amounts of pretraining data. This work introduces Subimage Overlap Prediction, a novel self-supervised pretraining task to aid semantic segmentation in remote sensing imagery that uses significantly lesser pretraining imagery. Given an image, a sub-image is extracted and the model is trained to produce a semantic mask of the location of the extracted sub-image within the original image. We demonstrate that pretraining with this task results in significantly faster convergence, and equal or better performance (measured via mIoU) on downstream segmentation. This gap in convergence and performance widens when labeled training data is reduced. We show this across multiple architecture types, and with multiple downstream datasets. We also show that our method matches or exceeds performance while requiring significantly lesser pretraining data relative to other SSL methods. Code and model weights are provided at \href{https://github.com/sharmalakshay93/subimage-overlap-prediction}{github.com/sharmalakshay93/subimage-overlap-prediction}.
翻译:自监督学习(SSL)方法已成为构建通用模型的主流范式,其能力可迁移至下游监督学习任务。然而,大多数此类方法依赖于海量预训练数据。本研究提出子图像重叠预测这一新颖的自监督预训练任务,旨在辅助遥感影像语义分割,且所需预训练影像显著减少。给定输入图像,首先提取子图像,随后训练模型生成该子图像在原图像中位置的语义掩码。实验表明,采用此任务进行预训练可实现显著更快的收敛速度,并在下游分割任务中取得相当或更优的性能(以mIoU度量)。当标注训练数据减少时,收敛速度与性能优势进一步扩大。我们在多种架构类型及多个下游数据集上验证了这一结论。同时证明,相较于其他SSL方法,本方法在所需预训练数据量显著减少的情况下仍能取得相当或更优的性能。代码与模型权重发布于 \href{https://github.com/sharmalakshay93/subimage-overlap-prediction}{github.com/sharmalakshay93/subimage-overlap-prediction}。