The vulnerability of machine learning-based malware detectors to adversarial attacks has prompted the need for robust solutions. Adversarial training is an effective method but is computationally expensive to scale up to large datasets and comes at the cost of sacrificing model performance for robustness. We hypothesize that adversarial malware exploits the low-confidence regions of models and can be identified using epistemic uncertainty of ML approaches -- epistemic uncertainty in a machine learning-based malware detector is a result of a lack of similar training samples in regions of the problem space. In particular, a Bayesian formulation can capture the model parameters' distribution and quantify epistemic uncertainty without sacrificing model performance. To verify our hypothesis, we consider Bayesian learning approaches with a mutual information-based formulation to quantify uncertainty and detect adversarial malware in Android, Windows domains and PDF malware. We found, quantifying uncertainty through Bayesian learning methods can defend against adversarial malware. In particular, Bayesian models: (1) are generally capable of identifying adversarial malware in both feature and problem space, (2) can detect concept drift by measuring uncertainty, and (3) with a diversity-promoting approach (or better posterior approximations) lead to parameter instances from the posterior to significantly enhance a detectors' ability.
翻译:基于机器学习的恶意软件检测器易受对抗攻击的脆弱性促使了鲁棒解决方案的迫切需求。对抗训练是一种有效方法,但扩展至大规模数据集的计算成本高昂,且以牺牲模型性能换取鲁棒性为代价。我们假设对抗性恶意软件利用了模型的低置信度区域,可通过机器学习方法中认知不确定性进行识别——基于机器学习的恶意软件检测器中的认知不确定性源于问题空间区域中缺乏相似训练样本。特别地,贝叶斯公式能捕获模型参数的分布,并在不牺牲模型性能的前提下量化认知不确定性。为验证该假设,我们采用基于互信息公式的贝叶斯学习方法,量化Android、Windows域及PDF恶意软件中的不确定性并检测对抗性恶意软件。研究发现,通过贝叶斯学习方法量化不确定性可防御对抗性恶意软件。具体而言,贝叶斯模型:(1) 通常能在特征空间与问题空间中识别对抗性恶意软件;(2) 可通过测量不确定性检测概念漂移;(3) 结合多样性促进方法(或更优的后验近似),能从后验中生成参数实例,显著增强检测器的能力。