We study three models of the problem of adversarial training in multiclass classification designed to construct robust classifiers against adversarial perturbations of data in the agnostic-classifier setting. We prove the existence of Borel measurable robust classifiers in each model and provide a unified perspective of the adversarial training problem, expanding the connections with optimal transport initiated by the authors in previous work and developing new connections between adversarial training in the multiclass setting and total variation regularization. As a corollary of our results, we prove the existence of Borel measurable solutions to the agnostic adversarial training problem in the binary classification setting, a result that improves results in the literature of adversarial training, where robust classifiers were only known to exist within the enlarged universal $\sigma$-algebra of the feature space.
翻译:我们研究了面向多分类问题的三种对抗训练模型,这些模型旨在构建对数据对抗扰动具有鲁棒性的分类器,且采用不可知分类器设定。我们证明了每个模型中鲁棒分类器的Borel可测性存在性,并对对抗训练问题提供了统一视角:一方面扩展了我们前期工作中由作者建立的与最优传输的联系,另一方面发展了多分类对抗训练与全变差正则化之间的新关联。作为结果的推论,我们证明了二分类设定下不可知对抗训练问题存在Borel可测解,这一结果改进了对抗训练文献中的相关结论——此前仅已知鲁棒分类器在特征空间扩大后的普遍$\sigma$-代数中存在。