Deep learning-based malware classifiers face significant challenges due to concept drift. The rapid evolution of malware, especially with new families, can depress classification accuracy to near-random levels. Previous research has primarily focused on detecting drift samples, relying on expert-led analysis and labeling for model retraining. However, these methods often lack a comprehensive understanding of malware concepts and provide limited guidance for effective drift adaptation, leading to unstable detection performance and high human labeling costs. To address these limitations, we introduce DREAM, a novel system designed to surpass the capabilities of existing drift detectors and to establish an explanatory drift adaptation process. DREAM enhances drift detection through model sensitivity and data autonomy. The detector, trained in a semi-supervised approach, proactively captures malware behavior concepts through classifier feedback. During testing, it utilizes samples generated by the detector itself, eliminating reliance on extensive training data. For drift adaptation, DREAM enlarges human intervention, enabling revisions of malware labels and concept explanations embedded within the detector's latent space. To ensure a comprehensive response to concept drift, it facilitates a coordinated update process for both the classifier and the detector. Our evaluation shows that DREAM can effectively improve the drift detection accuracy and reduce the expert analysis effort in adaptation across different malware datasets and classifiers.
翻译:基于深度学习的恶意软件分类器面临概念漂移带来的严峻挑战。恶意软件的快速演变,尤其是新型恶意软件家族的出现,可能导致分类准确率降至接近随机水平。现有研究主要聚焦于检测漂移样本,并依赖专家主导的分析和标注进行模型重训练。然而,这些方法往往缺乏对恶意软件概念的全面理解,且难以提供有效的漂移适应指导,导致检测性能不稳定以及高昂的人工标注成本。为解决上述局限,我们提出DREAM系统,该系统旨在超越现有漂移检测器的能力,并构建具有可解释性的漂移适应流程。DREAM通过模型敏感性与数据自主性增强漂移检测能力。采用半监督方式训练的检测器,能够通过分类器反馈主动捕捉恶意软件行为概念。在测试阶段,该检测器利用自身生成的样本,无需依赖大量训练数据。在漂移适应环节,DREAM扩展了人工干预空间,支持对检测器隐空间中嵌入的恶意软件标签与概念解释进行修正。为确保对概念漂移的全面响应,该系统促进了分类器与检测器的协同更新。实验评估表明,DREAM可在不同恶意软件数据集与分类器上有效提升漂移检测准确率,同时降低适应阶段所需的专家分析工作量。