Explainability Guided Adversarial Evasion Attacks on Malware Detectors

As the focus on security of Artificial Intelligence (AI) is becoming paramount, research on crafting and inserting optimal adversarial perturbations has become increasingly critical. In the malware domain, this adversarial sample generation relies heavily on the accuracy and placement of crafted perturbation with the goal of evading a trained classifier. This work focuses on applying explainability techniques to enhance the adversarial evasion attack on a machine-learning-based Windows PE malware detector. The explainable tool identifies the regions of PE malware files that have the most significant impact on the decision-making process of a given malware detector, and therefore, the same regions can be leveraged to inject the adversarial perturbation for maximum efficiency. Profiling all the PE malware file regions based on their impact on the malware detector's decision enables the derivation of an efficient strategy for identifying the optimal location for perturbation injection. The strategy should incorporate the region's significance in influencing the malware detector's decision and the sensitivity of the PE malware file's integrity towards modifying that region. To assess the utility of explainable AI in crafting an adversarial sample of Windows PE malware, we utilize the DeepExplainer module of SHAP for determining the contribution of each region of PE malware to its detection by a CNN-based malware detector, MalConv. Furthermore, we analyzed the significance of SHAP values at a more granular level by subdividing each section of Windows PE into small subsections. We then performed an adversarial evasion attack on the subsections based on the corresponding SHAP values of the byte sequences.

翻译：随着人工智能（AI）安全成为重中之重，关于构造和插入最优对抗扰动的研究日益关键。在恶意软件领域，此类对抗样本生成高度依赖于精心构造的扰动精度与放置位置，其目标是逃避训练好的分类器。本文聚焦于应用可解释性技术增强对基于机器学习的Windows PE恶意软件检测器的对抗逃逸攻击。可解释性工具可识别PE恶意软件文件中对给定检测器决策过程影响最大的区域，因此可利用相同区域注入对抗扰动以实现最高效率。通过根据各PE恶意软件文件区域对检测器决策的影响程度进行剖析，可推导出识别扰动注入最优位置的高效策略。该策略需兼顾区域影响检测器决策的重要性，以及修改该区域时PE恶意软件文件完整性的敏感度。为评估可解释AI在构造Windows PE恶意软件对抗样本中的效用，我们利用SHAP的DeepExplainer模块确定PE恶意软件各区域对基于CNN的恶意软件检测器MalConv检测的贡献度。此外，通过将Windows PE的每个节细分为更小子节，我们在更细粒度层面分析了SHAP值的显著性，并基于字节序列对应的SHAP值对这些子节实施对抗逃逸攻击。