We introduce a meta-learning algorithm for adversarially robust classification. The proposed method tries to be as model agnostic as possible and optimizes a dataset prior to its deployment in a machine learning system, aiming to effectively erase its non-robust features. Once the dataset has been created, in principle no specialized algorithm (besides standard gradient descent) is needed to train a robust model. We formulate the data optimization procedure as a bi-level optimization problem on kernel regression, with a class of kernels that describe infinitely wide neural nets (Neural Tangent Kernels). We present extensive experiments on standard computer vision benchmarks using a variety of different models, demonstrating the effectiveness of our method, while also pointing out its current shortcomings. In parallel, we revisit prior work that also focused on the problem of data optimization for robust classification \citep{Ily+19}, and show that being robust to adversarial attacks after standard (gradient descent) training on a suitable dataset is more challenging than previously thought.
翻译:我们提出了一种用于对抗鲁棒分类的元学习算法。该方法力求尽可能与模型无关,并在数据集部署到机器学习系统之前对其进行优化,旨在有效消除其非鲁棒特征。一旦数据集创建完成,原则上不需要专门算法(除了标准梯度下降)即可训练出鲁棒模型。我们将数据优化过程表述为核回归上的双层优化问题,其中核函数类别描述了无限宽神经网络(神经正切核)。我们使用多种不同模型在标准计算机视觉基准上进行了大量实验,证明了该方法的有效性,同时指出了其当前的不足之处。此外,我们重新审视了先前聚焦于鲁棒分类数据优化问题的研究 \citep{Ily+19},并表明在合适数据集上经过标准(梯度下降)训练后,实现对对抗攻击的鲁棒性比以往认为的更具挑战性。