This work proposes a structural approach to concept drift detection in malware classification using decision tree rulesets. Classifiers are trained across temporal windows on the EMBER2024 dataset, and drift is quantified by comparing extracted rule representations using feature importance, prediction agreement, activation stability, and coverage metrics. These metrics are correlated with both accuracy degradation and data distribution shift as complementary drift indicators. The approach is evaluated across six malware families using fixed-interval and clustering-based windowing in family-vs-benign and family-vs-family settings, and compared against RIPPER and Transcendent baselines. Results show that fixed two-month windowing with feature-level Pearson correlation is the most reliable configuration, being the only one where all family pairs produce positive drift-accuracy correlations. The methods are complementary - no single approach dominates across all pairs.
翻译:本文提出一种基于决策树规则集的结构化方法,用于检测恶意软件分类中的概念漂移。利用EMBER2024数据集,在时间窗口上训练分类器,并通过特征重要性、预测一致性、激活稳定性及覆盖率指标对提取的规则表示进行比较,从而量化漂移程度。这些指标与准确率下降及数据分布偏移均具有相关性,可作为互补的漂移指示因子。在六种恶意软件家族上,采用固定间隔与基于聚类的窗口划分方法,分别在家族vs良性软件和家族vs家族场景下评估该方法,并与RIPPER和Transcendent基线进行对比。结果表明,固定双月窗口结合特征级皮尔逊相关性是最可靠的配置——它是唯一使所有家族对均产生正漂移-准确率相关性的方案。各方法具有互补性,无单一方法在所有家族对上占优。