In recent years, machine learning models, especially deep neural networks, have been widely used for classification tasks in the security domain. However, these models have been shown to be vulnerable to adversarial manipulation: small changes learned by an adversarial attack model, when applied to the input, can cause significant changes in the output. Most research on adversarial attacks and corresponding defense methods focuses only on scenarios where adversarial samples are directly generated by the attack model. In this study, we explore a more practical scenario in behavior-based authentication, where adversarial samples are collected from the attacker. The generated adversarial samples from the model are replicated by attackers with a certain level of discrepancy. We propose an eXplainable AI (XAI) based defense strategy against adversarial attacks in such scenarios. A feature selector, trained with our method, can be used as a filter in front of the original authenticator. It filters out features that are more vulnerable to adversarial attacks or irrelevant to authentication, while retaining features that are more robust. Through comprehensive experiments, we demonstrate that our XAI based defense strategy is effective against adversarial attacks and outperforms other defense strategies, such as adversarial training and defensive distillation.
翻译:近年来,机器学习模型(尤其是深度神经网络)已广泛应用于安全领域的分类任务。然而,研究表明这些模型容易受到对抗性操纵:由对抗攻击模型学习到的微小变化,当应用于输入时,会导致输出产生显著改变。大多数对抗攻击及相应防御方法的研究仅关注对抗样本由攻击模型直接生成的场景。在本研究中,我们探讨了行为认证中更实际的场景,即对抗样本来源于真实攻击者。模型生成的对抗样本由攻击者以一定偏差幅度进行复制。我们提出了一种基于可解释人工智能(XAI)的防御策略,以应对此类场景中的对抗攻击。通过我们的方法训练的特征选择器,可作为原始认证系统前端的滤波器,剔除更容易受到对抗攻击或与认证无关的特征,同时保留更鲁棒的特征。通过综合实验,我们证明了基于XAI的防御策略能有效抵御对抗攻击,且性能优于对抗训练和防御蒸馏等其他防御策略。