Deep learning-based malware detectors have been shown to be susceptible to adversarial malware examples, i.e. malware examples that have been deliberately manipulated in order to avoid detection. In light of the vulnerability of deep learning detectors to subtle input file modifications, we propose a practical defense against adversarial malware examples inspired by (de)randomized smoothing. In this work, we reduce the chances of sampling adversarial content injected by malware authors by selecting correlated subsets of bytes, rather than using Gaussian noise to randomize inputs like in the Computer Vision (CV) domain. During training, our ablation-based smoothing scheme trains a base classifier to make classifications on a subset of contiguous bytes or chunk of bytes. At test time, a large number of chunks are then classified by a base classifier and the consensus among these classifications is then reported as the final prediction. We propose two strategies to determine the location of the chunks used for classification: (1) randomly selecting the locations of the chunks and (2) selecting contiguous adjacent chunks. To showcase the effectiveness of our approach, we have trained two classifiers with our chunk-based ablation schemes on the BODMAS dataset. Our findings reveal that the chunk-based smoothing classifiers exhibit greater resilience against adversarial malware examples generated with state-of-the-are evasion attacks, outperforming a non-smoothed classifier and a randomized smoothing-based classifier by a great margin.
翻译:基于深度学习的恶意软件检测器已被证明易受对抗性恶意软件示例的影响,即那些为逃避检测而被故意篡改的恶意软件示例。鉴于深度学习检测器对细微输入文件修改的脆弱性,我们提出了一种受(去)随机化平滑启发的实用防御方法,以应对对抗性恶意软件示例。在本工作中,我们通过选择字节的相关子集来降低恶意软件作者注入对抗性内容的采样概率,而非像计算机视觉领域那样使用高斯噪声对输入进行随机化。在训练阶段,我们的基于消融的平滑方案训练一个基础分类器,对连续字节子集或字节块进行分类。在测试阶段,大量字节块由基础分类器进行分类,这些分类结果之间的共识被报告为最终预测。我们提出了两种确定分类所用字节块位置的策略:(1)随机选择字节块位置;(2)选择连续相邻的字节块。为展示我们方法的有效性,我们在BODMAS数据集上使用基于字节块的消融方案训练了两个分类器。研究结果表明,基于字节块的平滑分类器对采用最新逃避攻击生成的对抗性恶意软件示例表现出更强的鲁棒性,其性能远超非平滑分类器和基于随机化平滑的分类器。