While the literature on security attacks and defense of Machine Learning (ML) systems mostly focuses on unrealistic adversarial examples, recent research has raised concern about the under-explored field of realistic adversarial attacks and their implications on the robustness of real-world systems. Our paper paves the way for a better understanding of adversarial robustness against realistic attacks and makes two major contributions. First, we conduct a study on three real-world use cases (text classification, botnet detection, malware detection)) and five datasets in order to evaluate whether unrealistic adversarial examples can be used to protect models against realistic examples. Our results reveal discrepancies across the use cases, where unrealistic examples can either be as effective as the realistic ones or may offer only limited improvement. Second, to explain these results, we analyze the latent representation of the adversarial examples generated with realistic and unrealistic attacks. We shed light on the patterns that discriminate which unrealistic examples can be used for effective hardening. We release our code, datasets and models to support future research in exploring how to reduce the gap between unrealistic and realistic adversarial attacks.
翻译:尽管关于机器学习系统安全攻击与防御的文献大多关注非真实的对抗样本,但近期研究对真实对抗攻击这一未充分探索的领域及其对实际系统鲁棒性的影响提出了关切。本文为更好地理解针对真实攻击的对抗鲁棒性铺平了道路,并做出两项主要贡献。首先,我们针对三个实际应用案例(文本分类、僵尸网络检测、恶意软件检测)和五个数据集展开研究,以评估非真实对抗样本是否可用于保护模型抵御真实样本。我们的结果揭示了不同应用案例间的差异:非真实样本有时可与真实样本同样有效,有时仅能提供有限的改进。其次,为解释这些结果,我们分析了通过真实与非真实攻击生成的对抗样本的潜在表征,揭示了区分哪些非真实样本可用于有效强化的模式。我们公开了代码、数据集和模型,以支持未来关于缩小非真实与真实对抗攻击之间差距的研究。