ML models are known to be vulnerable to adversarial query attacks. In these attacks, queries are iteratively perturbed towards a particular class without any knowledge of the target model besides its output. The prevalence of remotely-hosted ML classification models and Machine-Learning-as-a-Service platforms means that query attacks pose a real threat to the security of these systems. To deal with this, stateful defenses have been proposed to detect query attacks and prevent the generation of adversarial examples by monitoring and analyzing the sequence of queries received by the system. Several stateful defenses have been proposed in recent years. However, these defenses rely solely on similarity or out-of-distribution detection methods that may be effective in other domains. In the malware detection domain, the methods to generate adversarial examples are inherently different, and therefore we find that such detection mechanisms are significantly less effective. Hence, in this paper, we present MalProtect, which is a stateful defense against query attacks in the malware detection domain. MalProtect uses several threat indicators to detect attacks. Our results show that it reduces the evasion rate of adversarial query attacks by 80+\% in Android and Windows malware, across a range of attacker scenarios. In the first evaluation of its kind, we show that MalProtect outperforms prior stateful defenses, especially under the peak adversarial threat.
翻译:摘要:机器学习模型已知容易受到对抗性查询攻击。在此类攻击中,查询会被逐步扰动至特定类别,而攻击者无需了解目标模型除输出以外的任何信息。远程托管的机器学习分类模型及机器学习即服务平台(MLaaS)的普及意味着查询攻击对这些系统的安全构成了真实威胁。为应对这一问题,学界提出了基于状态的防御方法,通过监控和分析系统接收到的查询序列来检测查询攻击并阻止对抗性样本的生成。近年来已有多项此类防御措施被提出,然而这些防御方法完全依赖于相似性或分布外检测技术,这些技术在其他领域可能有效,但在恶意软件检测领域,对抗性样本的生成方法具有本质差异,因此我们发现这类检测机制的效果显著下降。为此,本文提出MalProtect,一种针对恶意软件检测领域查询攻击的基于状态防御方法。MalProtect利用多种威胁指标检测攻击。实验结果显示,在安卓和Windows恶意软件场景下,该防御方法能将对抗性查询攻击的规避率降低80%以上,覆盖多种攻击者场景。在同类首次评估中,我们证明MalProtect优于先前的基于状态防御方法,尤其在对抗攻击峰值威胁场景下表现更为突出。