Machine-learning models are known to be vulnerable to evasion attacks that perturb model inputs to induce misclassifications. In this work, we identify real-world scenarios where the true threat cannot be assessed accurately by existing attacks. Specifically, we find that conventional metrics measuring targeted and untargeted robustness do not appropriately reflect a model's ability to withstand attacks from one set of source classes to another set of target classes. To address the shortcomings of existing methods, we formally define a new metric, termed group-based robustness, that complements existing metrics and is better-suited for evaluating model performance in certain attack scenarios. We show empirically that group-based robustness allows us to distinguish between models' vulnerability against specific threat models in situations where traditional robustness metrics do not apply. Moreover, to measure group-based robustness efficiently and accurately, we 1) propose two loss functions and 2) identify three new attack strategies. We show empirically that with comparable success rates, finding evasive samples using our new loss functions saves computation by a factor as large as the number of targeted classes, and finding evasive samples using our new attack strategies saves time by up to 99\% compared to brute-force search methods. Finally, we propose a defense method that increases group-based robustness by up to 3.52$\times$.
翻译:机器学习模型已知易受规避攻击影响,这类攻击通过扰动模型输入来诱导分类错误。本工作中,我们识别出真实世界场景中现有攻击无法准确评估真实威胁的情形。具体而言,我们发现衡量定向与非定向鲁棒性的传统指标未能恰当反映模型抵御从一组源类别向另一组目标类别攻击的能力。为弥补现有方法的不足,我们正式定义了一项新指标——基于分组的鲁棒性,该指标与现有指标互补,更适用于评估模型在某些攻击场景中的表现。实验表明,基于分组的鲁棒性使我们能够在传统鲁棒性指标不适用的情境下区分模型对特定威胁模型的脆弱性。此外,为高效准确地衡量基于分组的鲁棒性,我们:1)提出两种损失函数,2)识别三种新型攻击策略。实验证明,在成功率相当的情况下,使用新损失函数寻找规避样本可将计算量减少至定向类别数目之倒数,而采用新攻击策略寻找规避样本相比暴力搜索方法可节省高达99%的时间。最后,我们提出一种防御方法,能将基于分组的鲁棒性提升至多3.52倍。