This work investigates adversarial training in the context of margin-based linear classifiers in the high-dimensional regime where the dimension $d$ and the number of data points $n$ diverge with a fixed ratio $\alpha = n / d$. We introduce a tractable mathematical model where the interplay between the data and adversarial attacker geometries can be studied, while capturing the core phenomenology observed in the adversarial robustness literature. Our main theoretical contribution is an exact asymptotic description of the sufficient statistics for the adversarial empirical risk minimiser, under generic convex and non-increasing losses for a Block Feature Model. Our result allow us to precisely characterise which directions in the data are associated with a higher generalisation/robustness trade-off, as defined by a robustness and a usefulness metric. We show that the the presence of multiple different feature types is crucial to the high sample complexity performances of adversarial training. In particular, we unveil the existence of directions which can be defended without penalising accuracy. Finally, we show the advantage of defending non-robust features during training, identifying a uniform protection as an inherently effective defence mechanism.
翻译:本研究探讨了在高维机制下基于间隔的线性分类器的对抗训练问题,其中维度$d$与数据点数量$n$以固定比率$\alpha = n / d$同步增长。我们引入了一个可处理的数学模型,该模型能够研究数据与对抗攻击者几何结构之间的相互作用,同时捕捉对抗鲁棒性文献中观察到的核心现象学。我们的主要理论贡献在于,针对块特征模型在一般凸且非递增损失函数下,给出了对抗经验风险最小化器充分统计量的精确渐近描述。该结果使我们能够精确刻画数据中哪些方向与更高的泛化/鲁棒性权衡相关联,这一权衡由鲁棒性和有用性度量定义。我们证明了多种不同类型特征的存在对于对抗训练的高样本复杂度性能至关重要。特别地,我们揭示了存在某些方向可以在不牺牲准确性的情况下被有效防御。最后,我们展示了在训练过程中防御非鲁棒特征的优势,并将均匀保护识别为一种内在有效的防御机制。