Recent years have witnessed the remarkable success of deep learning in remote sensing image interpretation, driven by the availability of large-scale benchmark datasets. However, this reliance on massive training data also brings two major challenges: (1) high storage and computational costs, and (2) the risk of data leakage, especially when sensitive categories are involved. To address these challenges, this study introduces the concept of dataset distillation into the field of remote sensing image interpretation for the first time. Specifically, we train a text-to-image diffusion model to condense a large-scale remote sensing dataset into a compact and representative distilled dataset. To improve the discriminative quality of the synthesized samples, we propose a classifier-driven guidance by injecting a classification consistency loss from a pre-trained model into the diffusion training process. Besides, considering the rich semantic complexity of remote sensing imagery, we further perform latent space clustering on training samples to select representative and diverse prototypes as visual style guidance, while using a visual language model to provide aggregated text descriptions. Experiments on three high-resolution remote sensing scene classification benchmarks show that the proposed method can distill realistic and diverse samples for downstream model training. Code and pre-trained models are available online (https://github.com/YonghaoXu/DPD).
翻译:近年来,深度学习在遥感影像解译领域取得了显著成功,这主要得益于大规模基准数据集的广泛应用。然而,这种对海量训练数据的依赖也带来了两大挑战:(1) 高昂的存储与计算成本;(2) 数据泄露风险,尤其在涉及敏感类别时更为突出。为应对这些挑战,本研究首次将数据集蒸馏概念引入遥感影像解译领域。具体而言,我们训练了一个文本到图像的扩散模型,将大规模遥感数据集压缩为紧凑且具有代表性的蒸馏数据集。为提升合成样本的判别质量,我们提出了一种分类器驱动引导机制,通过将预训练模型的分类一致性损失注入扩散训练过程来实现。此外,考虑到遥感影像丰富的语义复杂性,我们进一步对训练样本进行潜在空间聚类,以选取具有代表性和多样性的原型作为视觉风格引导,同时利用视觉语言模型提供聚合的文本描述。在三个高分辨率遥感场景分类基准数据集上的实验表明,所提方法能够为下游模型训练蒸馏出真实且多样化的样本。代码与预训练模型已公开(https://github.com/YonghaoXu/DPD)。