This work explores capabilities of the pre-trained CLIP vision-language model to identify satellite images affected by clouds. Several approaches to using the model to perform cloud presence detection are proposed and evaluated, including a purely zero-shot operation with text prompts and several fine-tuning approaches. Furthermore, the transferability of the methods across different datasets and sensor types (Sentinel-2 and Landsat-8) is tested. The results that CLIP can achieve non-trivial performance on the cloud presence detection task with apparent capability to generalise across sensing modalities and sensing bands. It is also found that a low-cost fine-tuning stage leads to a strong increase in true negative rate. The results demonstrate that the representations learned by the CLIP model can be useful for satellite image processing tasks involving clouds.
翻译:本研究探索了预训练CLIP视觉-语言模型在识别受云层影响的卫星图像方面的能力。提出了并评估了多种利用该模型进行云量存在检测的方法,包括纯零样本的文本提示操作以及多种微调策略。此外,还测试了这些方法在不同数据集与传感器类型(Sentinel-2和Landsat-8)间的可迁移性。结果表明,CLIP模型在云量存在检测任务上能够取得显著性能,并具备跨传感模态和传感波段泛化的明显能力。研究还发现,低成本的微调阶段能够显著提升真阴性率。这些结果证明,CLIP模型学到的表征对涉及云层的卫星图像处理任务具有实用价值。