Existing literature on adversarial Machine Learning (ML) focuses either on showing attacks that break every ML model, or defenses that withstand most attacks. Unfortunately, little consideration is given to the actual feasibility of the attack or the defense. Moreover, adversarial samples are often crafted in the "feature-space", making the corresponding evaluations of questionable value. Simply put, the current situation does not allow to estimate the actual threat posed by adversarial attacks, leading to a lack of secure ML systems. We aim to clarify such confusion in this paper. By considering the application of ML for Phishing Website Detection (PWD), we formalize the "evasion-space" in which an adversarial perturbation can be introduced to fool a ML-PWD -- demonstrating that even perturbations in the "feature-space" are useful. Then, we propose a realistic threat model describing evasion attacks against ML-PWD that are cheap to stage, and hence intrinsically more attractive for real phishers. After that, we perform the first statistically validated assessment of state-of-the-art ML-PWD against 12 evasion attacks. Our evaluation shows (i) the true efficacy of evasion attempts that are more likely to occur; and (ii) the impact of perturbations crafted in different evasion-spaces. Our realistic evasion attempts induce a statistically significant degradation (3-10% at p<0.05), and their cheap cost makes them a subtle threat. Notably, however, some ML-PWD are immune to our most realistic attacks (p=0.22). Finally, as an additional contribution of this journal publication, we are the first to consider the intriguing case wherein an attacker introduces perturbations in multiple evasion-spaces at the same time. These new results show that simultaneously applying perturbations in the problem- and feature-space can cause a drop in the detection rate from 0.95 to 0.
翻译:现有关于对抗性机器学习(ML)的文献主要聚焦于展示能够攻破所有ML模型的攻击,或是能够抵御大多数攻击的防御手段。遗憾的是,这些研究往往忽视了攻击或防御的实际可行性。此外,对抗样本通常是在“特征空间”中构建的,这使得相应的评估价值存疑。简而言之,当前状况导致无法准确估计对抗性攻击的实际威胁,进而造成缺乏安全的ML系统。本文旨在澄清这一困惑。通过考虑ML在钓鱼网站检测(PWD)中的应用,我们形式化了“逃逸空间”——即可以引入对抗性扰动以欺骗ML-PWD的范畴,并证明即使在“特征空间”中的扰动也是有效的。随后,我们提出了一种现实的威胁模型,描述了针对ML-PWD的逃逸攻击,这些攻击成本低廉,因此对真实钓鱼者更具吸引力。之后,我们首次对先进ML-PWD进行了12种逃逸攻击的统计验证评估。评估结果显示了(i)更可能发生的逃逸尝试的真实效力;以及(ii)在不同逃逸空间中构建的扰动的影响。我们提出的现实逃逸尝试导致了统计上显著的性能下降(在p<0.05水平下降低3-10%),而其低成本使其成为一种隐蔽威胁。然而值得注意的是,部分ML-PWD对我们最现实的攻击具有免疫力(p=0.22)。最后,作为本期刊论文的额外贡献,我们首次考虑了攻击者同时在多个逃逸空间中引入扰动的有趣情形。这些新结果表明,同时在问题空间和特征空间中施加扰动可将检测率从0.95降至0。