Existing literature on adversarial Machine Learning (ML) focuses either on showing attacks that break every ML model, or defenses that withstand most attacks. Unfortunately, little consideration is given to the actual feasibility of the attack or the defense. Moreover, adversarial samples are often crafted in the "feature-space", making the corresponding evaluations of questionable value. Simply put, the current situation does not allow to estimate the actual threat posed by adversarial attacks, leading to a lack of secure ML systems. We aim to clarify such confusion in this paper. By considering the application of ML for Phishing Website Detection (PWD), we formalize the "evasion-space" in which an adversarial perturbation can be introduced to fool a ML-PWD -- demonstrating that even perturbations in the "feature-space" are useful. Then, we propose a realistic threat model describing evasion attacks against ML-PWD that are cheap to stage, and hence intrinsically more attractive for real phishers. After that, we perform the first statistically validated assessment of state-of-the-art ML-PWD against 12 evasion attacks. Our evaluation shows (i) the true efficacy of evasion attempts that are more likely to occur; and (ii) the impact of perturbations crafted in different evasion-spaces. Our realistic evasion attempts induce a statistically significant degradation (3-10% at p<0.05), and their cheap cost makes them a subtle threat. Notably, however, some ML-PWD are immune to our most realistic attacks (p=0.22). Finally, as an additional contribution of this journal publication, we are the first to consider the intriguing case wherein an attacker introduces perturbations in multiple evasion-spaces at the same time. These new results show that simultaneously applying perturbations in the problem- and feature-space can cause a drop in the detection rate from 0.95 to 0.
翻译:现有关于对抗性机器学习(ML)的文献主要聚焦于两种方向:要么展示能够攻破所有ML模型的攻击方法,要么提出能够抵御大多数攻击的防御策略。遗憾的是,鲜有研究关注攻击或防御的实际可行性。此外,对抗样本通常是在"特征空间"中构造的,这使得相应的评估价值存疑。简言之,当前现状使我们无法评估对抗攻击的实际威胁,进而导致缺乏安全的ML系统。本文旨在厘清这一困惑。通过考虑ML在钓鱼网站检测(PWD)中的应用,我们正式定义了"规避空间"——在该空间中,对抗扰动可被引入以欺骗ML-PWD模型——同时证明即使在"特征空间"中的扰动也是有效的。随后,我们提出一种现实的威胁模型,描述了针对ML-PWD的规避攻击,这些攻击成本低廉,因而对真实钓鱼者更具吸引力。之后,我们首次对现有最优的ML-PWD方法进行了经过统计验证的评估,测试其对抗12种规避攻击的能力。评估结果表明:(i)更可能发生的真实规避尝试的实际有效性;(ii)在不同规避空间中构造的扰动的影响。我们的真实规避尝试引起了统计上显著的性能下降(3-10%,p<0.05),且其低成本使其构成一种隐蔽威胁。值得注意的是,部分ML-PWD方法对我们最真实的攻击具有免疫性(p=0.22)。最后,作为本期刊论文的额外贡献,我们首次考虑了攻击者同时在多个规避空间中引入扰动的有趣情况。这些新结果表明,同时在问题空间和特征空间中施加扰动可使检测率从0.95降至0。