Evasion Attacks (EA) are used to test the robustness of trained neural networks by distorting input data to misguide the model into incorrect classifications. Creating these attacks is a challenging task, especially with the ever-increasing complexity of models and datasets. In this work, we introduce a self-supervised, computationally economical method for generating adversarial examples, designed for the unseen black-box setting. Adapting techniques from representation learning, our method generates on-manifold EAs that are encouraged to resemble the data distribution. These attacks are comparable in effectiveness compared to the state-of-the-art when attacking the model trained on, but are significantly more effective when attacking unseen models, as the attacks are more related to the data rather than the model itself. Our experiments consistently demonstrate the method is effective across various models, unseen data categories, and even defended models, suggesting a significant role for on-manifold EAs when targeting unseen models.
翻译:逃避攻击(EA)通过扭曲输入数据来误导模型进行错误分类,用于测试已训练神经网络的鲁棒性。生成此类攻击是一项具有挑战性的任务,尤其是在模型和数据集复杂性日益增长的背景下。本文提出了一种自监督、计算经济的对抗样本生成方法,专为未知黑盒场景设计。通过借鉴表示学习技术,该方法生成的流形上EA被鼓励接近数据分布。在攻击训练所用模型时,这些攻击的有效性与现有最先进方法相当;但在攻击未知模型时,其效果显著更优,因为此类攻击更依赖于数据本身而非模型。实验一致表明,该方法在多种模型、未知数据类别甚至防御模型上均有效,这表明流形上EA在针对未知模型时具有重要作用。