Cross-Domain Few-Shot Learning (CD-FSL) is a recently emerging task that tackles few-shot learning across different domains. It aims at transferring prior knowledge learned on the source dataset to novel target datasets. The CD-FSL task is especially challenged by the huge domain gap between different datasets. Critically, such a domain gap actually comes from the changes of visual styles, and wave-SAN empirically shows that spanning the style distribution of the source data helps alleviate this issue. However, wave-SAN simply swaps styles of two images. Such a vanilla operation makes the generated styles ``real'' and ``easy'', which still fall into the original set of the source styles. Thus, inspired by vanilla adversarial learning, a novel model-agnostic meta Style Adversarial training (StyleAdv) method together with a novel style adversarial attack method is proposed for CD-FSL. Particularly, our style attack method synthesizes both ``virtual'' and ``hard'' adversarial styles for model training. This is achieved by perturbing the original style with the signed style gradients. By continually attacking styles and forcing the model to recognize these challenging adversarial styles, our model is gradually robust to the visual styles, thus boosting the generalization ability for novel target datasets. Besides the typical CNN-based backbone, we also employ our StyleAdv method on large-scale pretrained vision transformer. Extensive experiments conducted on eight various target datasets show the effectiveness of our method. Whether built upon ResNet or ViT, we achieve the new state of the art for CD-FSL. Code is available at https://github.com/lovelyqian/StyleAdv-CDFSL.
翻译:跨域小样本学习(CD-FSL)是近年来兴起的一项任务,旨在解决不同领域间的小样本学习问题。其核心目标是将源数据集上学到的先验知识迁移至新颖的目标数据集。CD-FSL任务的主要挑战来自不同数据集之间巨大的领域差异。关键的是,这种领域差异实际上源于视觉风格的改变,而wave-SAN方法通过实验表明,扩展源数据的风格分布有助于缓解该问题。然而,wave-SAN仅简单交换两张图像的风格,这种原始操作生成的风格仍属于源风格集合中的“真实”且“简单”类型。受原始对抗学习的启发,本文提出了一种新颖的、与模型无关的元风格对抗训练方法(StyleAdv)及其配套的风格对抗攻击方法,用于解决CD-FSL问题。具体而言,我们的风格攻击方法为模型训练合成了“虚拟”且“困难”的对抗性风格。这通过用带符号的风格梯度扰动原始风格来实现。通过持续攻击风格并迫使模型识别这些具有挑战性的对抗风格,我们的模型逐渐对视觉风格具有鲁棒性,从而提升了对新颖目标数据集的泛化能力。除典型的CNN骨干网络外,我们还在大规模预训练视觉Transformer上应用了StyleAdv方法。在八个不同目标数据集上的大量实验表明,我们的方法具有有效性。无论基于ResNet还是ViT,我们均取得了CD-FSL任务的最新最佳结果。代码已开源:https://github.com/lovelyqian/StyleAdv-CDFSL。