Adversarial contrastive learning (ACL) does not require expensive data annotations but outputs a robust representation that withstands adversarial attacks and also generalizes to a wide range of downstream tasks. However, ACL needs tremendous running time to generate the adversarial variants of all training data, which limits its scalability to large datasets. To speed up ACL, this paper proposes a robustness-aware coreset selection (RCS) method. RCS does not require label information and searches for an informative subset that minimizes a representational divergence, which is the distance of the representation between natural data and their virtual adversarial variants. The vanilla solution of RCS via traversing all possible subsets is computationally prohibitive. Therefore, we theoretically transform RCS into a surrogate problem of submodular maximization, of which the greedy search is an efficient solution with an optimality guarantee for the original problem. Empirically, our comprehensive results corroborate that RCS can speed up ACL by a large margin without significantly hurting the robustness transferability. Notably, to the best of our knowledge, we are the first to conduct ACL efficiently on the large-scale ImageNet-1K dataset to obtain an effective robust representation via RCS.
翻译:对抗性对比学习无需昂贵的数据标注,但能生成抵御对抗攻击且适用于多种下游任务的鲁棒表示。然而,对抗性对比学习需要花费大量运行时间为全部训练数据生成对抗变体,这限制了其在大型数据集上的可扩展性。为加速对抗性对比学习,本文提出一种鲁棒性感知核心集选择方法。该方法无需标签信息,通过搜索能最小化表征差异(即自然数据与其虚拟对抗变体在表示空间中的距离)的信息性子集来实现加速。遍历所有可能子集的朴素解算方法在计算上不可行,因此我们通过理论推导将核心集选择转化为子模最大化的代理问题,并采用具有最优性保证的贪心搜索作为高效求解方案。实验结果表明,核心集选择能在不显著损害鲁棒迁移性的前提下大幅加速对抗性对比学习。值得注意的是,据我们所知,这是首次通过核心集选择方法在大型ImageNet-1K数据集上高效实现对抗性对比学习并获取有效鲁棒表示的工作。