The existence of adversarial attacks on machine learning models imperceptible to a human is still quite a mystery from a theoretical perspective. In this work, we introduce two notions of adversarial attacks: natural or on-manifold attacks, which are perceptible by a human/oracle, and unnatural or off-manifold attacks, which are not. We argue that the existence of the off-manifold attacks is a natural consequence of the dimension gap between the intrinsic and ambient dimensions of the data. For 2-layer ReLU networks, we prove that even though the dimension gap does not affect generalization performance on samples drawn from the observed data space, it makes the clean-trained model more vulnerable to adversarial perturbations in the off-manifold direction of the data space. Our main results provide an explicit relationship between the $\ell_2,\ell_{\infty}$ attack strength of the on/off-manifold attack and the dimension gap.
翻译:从理论角度而言,机器学习模型存在人类无法察觉的对抗攻击仍是一个未解之谜。本文引入两种对抗攻击概念:自然攻击(或称流形内攻击)可被人类/先知察觉,非自然攻击(或称流形外攻击)则无法被察觉。我们证明,流形外攻击的存在本质上是数据本征维度与环境维度之间维度差距的自然结果。对于两层ReLU网络,我们证实尽管维度差距不影响观测数据空间采样样本的泛化性能,但会使经过洁净训练的模型在数据空间流形外方向更易受到对抗扰动。主要结论揭示了$\ell_2,\ell_{\infty}$范数下流形内/流形外攻击强度与维度差距之间的显式关系。