Malware detection and classification into families are critical tasks in cybersecurity, complicated by the continual evolution of malware to evade detection. This evolution introduces concept drift, in which the statistical properties of malware features change over time, reducing the effectiveness of static machine learning models. Understanding and explaining this drift is essential for maintaining robust and trustworthy malware detectors. In this paper, we propose an interpretable approach to concept drift detection. Our method uses a rule-based classifier to generate human-readable descriptions of both original and evolved malware samples belonging to the same malware family. By comparing the resulting rule sets using a similarity function, we can detect and quantify concept drift. Crucially, this comparison also identifies the specific features and feature values that have changed, providing clear explanations of how malware has evolved to bypass detection. Experimental results demonstrate that the proposed method not only accurately detects drift but also provides actionable insights into the behavior of evolving malware families, supporting both detection and threat analysis.
翻译:恶意软件的检测与家族分类是网络安全中的关键任务,但恶意软件为规避检测而持续演化,使这些任务变得复杂。这种演化引入了概念漂移,即恶意软件特征的统计属性随时间发生变化,从而降低了静态机器学习模型的有效性。理解和解释这种漂移对于维护鲁棒且可信的恶意软件检测器至关重要。本文提出了一种可解释的概念漂移检测方法。我们的方法使用基于规则的分类器,为属于同一恶意软件家族的原始样本和演化样本生成人类可读的描述。通过使用相似度函数比较生成的规则集,我们能够检测并量化概念漂移。关键在于,这种比较还能识别出已发生改变的具体特征及其取值,从而清晰解释恶意软件如何演化以绕过检测。实验结果表明,所提方法不仅能准确检测漂移,还能为演化中的恶意软件家族行为提供可操作的见解,同时支持检测与威胁分析。