This work investigates adversarial training in the context of margin-based linear classifiers in the high-dimensional regime where the dimension $d$ and the number of data points $n$ diverge with a fixed ratio $\alpha = n / d$. We introduce a tractable mathematical model where the interplay between the data and adversarial attacker geometries can be studied, while capturing the core phenomenology observed in the adversarial robustness literature. Our main theoretical contribution is an exact asymptotic description of the sufficient statistics for the adversarial empirical risk minimiser, under generic convex and non-increasing losses. Our result allow us to precisely characterise which directions in the data are associated with a higher generalisation/robustness trade-off, as defined by a robustness and a usefulness metric. In particular, we unveil the existence of directions which can be defended without penalising accuracy. Finally, we show the advantage of defending non-robust features during training, identifying a uniform protection as an inherently effective defence mechanism.
翻译:本文研究高维条件下基于间隔的线性分类器中的对抗训练问题,其中维度$d$与数据点数量$n$以固定比率$\alpha = n/d$发散。我们引入一个可处理的数学模型,可在捕捉对抗鲁棒性文献中核心现象学的同时,研究数据几何结构与对抗攻击者几何结构之间的相互作用。主要理论贡献在于:在一般凸非增损失函数下,给出了对抗经验风险最小化器充分统计量的精确渐近描述。我们的结果能够精确刻画数据中哪些方向与由鲁棒性和有用性指标定义的泛化/鲁棒性权衡具有更高关联性。特别地,我们揭示了存在可在不影响准确率的情况下进行防御的方向。最后,我们展示了在训练中防御非鲁棒特征的优势,并识别出统一保护机制作为一种固有有效的防御手段。