Investigating Adversarial Attacks in Software Analytics via Machine Learning Explainability

With the recent advancements in machine learning (ML), numerous ML-based approaches have been extensively applied in software analytics tasks to streamline software development and maintenance processes. Nevertheless, studies indicate that despite their potential usefulness, ML models are vulnerable to adversarial attacks, which may result in significant monetary losses in these processes. As a result, the ML models' robustness against adversarial attacks must be assessed before they are deployed in software analytics tasks. Despite several techniques being available for adversarial attacks in software analytics tasks, exploring adversarial attacks using ML explainability is largely unexplored. Therefore, this study aims to investigate the relationship between ML explainability and adversarial attacks to measure the robustness of ML models in software analytics tasks. In addition, unlike most existing attacks that directly perturb input-space, our attack approach focuses on perturbing feature-space. Our extensive experiments, involving six datasets, three ML explainability techniques, and seven ML models, demonstrate that ML explainability can be used to conduct successful adversarial attacks on ML models in software analytics tasks. This is achieved by modifying only the top 1-3 important features identified by ML explainability techniques. Consequently, the ML models under attack fail to accurately predict up to 86.6% of instances that were correctly predicted before adversarial attacks, indicating the models' low robustness against such attacks. Finally, our proposed technique demonstrates promising results compared to four state-of-the-art adversarial attack techniques targeting tabular data.

翻译：随着机器学习（ML）技术的快速发展，大量基于ML的方法已被广泛应用于软件分析任务中，以优化软件开发和维护流程。然而，研究表明，尽管ML模型具有潜在的应用价值，但它们容易受到对抗性攻击的影响，这可能导致这些流程中产生重大的经济损失。因此，在将ML模型部署到软件分析任务之前，必须评估其对抗对抗性攻击的鲁棒性。尽管目前已有多种针对软件分析任务的对抗性攻击技术，但利用ML可解释性来探索对抗性攻击的研究尚不充分。因此，本研究旨在探究ML可解释性与对抗性攻击之间的关系，以评估ML模型在软件分析任务中的鲁棒性。此外，与大多数直接扰动输入空间的现有攻击方法不同，我们的攻击方法侧重于扰动特征空间。我们进行了广泛的实验，涉及六个数据集、三种ML可解释性技术和七种ML模型，结果表明，通过仅修改ML可解释性技术识别出的前1-3个重要特征，即可在软件分析任务中对ML模型成功实施对抗性攻击。因此，受攻击的ML模型无法准确预测高达86.6%在对抗性攻击前被正确预测的实例，这表明模型对此类攻击的鲁棒性较低。最后，与四种针对表格数据的最先进对抗性攻击技术相比，我们提出的技术展现了有前景的结果。