Machine learning-based malware detectors are increasingly vulnerable to adversarial examples. Traditional defenses, such as one-shot adversarial training, often fail against adaptive attackers who use reinforcement learning to bypass detection. This paper proposes a robust defense framework based on bilevel optimization, explicitly modeling the strategic interaction between a defender and an attacker as an adversarial co-evolutionary process. We evaluate our approach using the MAB-malware framework against three distinct malware families: Mokes, Strab, and DCRat. Our experimental results demonstrate that while standard classifiers and basic adversarial retraining often remain vulnerable, showing evasion rates as high as 90 %, the proposed bilevel optimization approach consistently achieves near-total immunity, reducing evasion rates to 0 - 1.89 %. Furthermore, the iterative framework significantly increases the attacker's query complexity, raising the average cost of successful evasion by up to two orders of magnitude. These findings suggest that modeling the iterative cycle of attack and defense through bilevel optimization is essential for developing resilient malware detection systems capable of withstanding evolving adversarial threats.
翻译:基于机器学习的恶意软件检测器对对抗样本日益脆弱。传统防御方法(如单次对抗训练)通常无法应对采用强化学习绕过检测的自适应攻击者。本文提出一种基于双层优化的鲁棒防御框架,将防御者与攻击者之间的战略互动显式建模为对抗性协同演化过程。我们利用MAB-malware框架评估该方法在三种不同恶意软件家族(Mokes、Strab和DCRat)上的表现。实验结果表明,标准分类器和基本对抗性重训练仍然存在显著脆弱性(逃避率高达90%),而所提出的双层优化方法能持续实现近乎完全免疫,将逃避率降至0-1.89%。此外,迭代框架显著增加了攻击者的查询复杂度,使成功逃避的平均成本提升两个数量级。这些发现表明,通过双层优化对攻击-防御迭代循环进行建模,对于构建能够抵御不断演变的对抗性威胁的恶意软件检测系统至关重要。