Adversarial contrastive learning (ACL) does not require expensive data annotations but outputs a robust representation that withstands adversarial attacks and also generalizes to a wide range of downstream tasks. However, ACL needs tremendous running time to generate the adversarial variants of all training data, which limits its scalability to large datasets. To speed up ACL, this paper proposes a robustness-aware coreset selection (RCS) method. RCS does not require label information and searches for an informative subset that minimizes a representational divergence, which is the distance of the representation between natural data and their virtual adversarial variants. The vanilla solution of RCS via traversing all possible subsets is computationally prohibitive. Therefore, we theoretically transform RCS into a surrogate problem of submodular maximization, of which the greedy search is an efficient solution with an optimality guarantee for the original problem. Empirically, our comprehensive results corroborate that RCS can speed up ACL by a large margin without significantly hurting the robustness transferability. Notably, to the best of our knowledge, we are the first to conduct ACL efficiently on the large-scale ImageNet-1K dataset to obtain an effective robust representation via RCS. Our source code is at https://github.com/GodXuxilie/Efficient_ACL_via_RCS.
翻译:对抗对比学习(ACL)无需昂贵的数据标注,却能输出鲁棒的表示,既能抵御对抗攻击,又可泛化至广泛的下游任务。然而,ACL需要消耗大量运行时间来生成所有训练数据的对抗变体,这限制了其在大规模数据集上的可扩展性。为加速ACL,本文提出一种鲁棒性感知核心集选择(RCS)方法。RCS无需标签信息,通过搜索信息量最丰富的子集来最小化表示散度——即自然数据与其虚拟对抗变体之间表示的距离。遍历所有可能子集的朴素RCS解法计算代价过高。因此,我们从理论上将RCS转化为子模最大化的代理问题,其贪心搜索是高效求解方案,且对原始问题具有最优性保证。实验结果表明,RCS能够大幅加速ACL,同时不会显著损害鲁棒性的迁移能力。值得注意的是,据我们所知,我们是首次在大规模ImageNet-1K数据集上通过RCS高效执行ACL,从而获得有效的鲁棒表示。我们的源代码位于https://github.com/GodXuxilie/Efficient_ACL_via_RCS。