Data stream mining is fundamentally challenged by concept drift, where distributional changes can degrade model performance. Despite the proliferation of drift detection methods, progress in the field is hindered by inconsistent evaluation practices: studies rely on oversimplified synthetic data generators, adopt incompatible metrics, and lack transparency in hyperparameter selection, making fair comparisons difficult. We address this gap with a novel benchmarking framework comprising three contributions: (1) a drift simulation method that injects controlled distributional changes into real-world datasets via Monte Carlo trials, enabling supervised evaluation while preserving real-world data complexity; (2) an evaluation protocol for drift detection with timing-aware criteria, including the derivation of new metrics (e.g., F1 detection score, normalized detection time) that are comparable across streams; and (3) we advocate for a leave-one-dataset-out hyperparameter optimization protocol for drift detection methods that promotes configuration robustness across heterogeneous stream dynamics. We benchmark 14 widely used drift detection methods on 7 realworld datasets across 4 drift types (class prior, label swap, feature permutation, feature filtering), each under both abrupt and gradual transitions. Our experimental results provide insights into the strengths and weaknesses of current drift detection approaches while establishing baseline performance metrics for future research in this area. All code and experiments are publicly available.
翻译:数据流挖掘面临的根本性挑战是概念漂移——分布变化可能导致模型性能下降。尽管漂移检测方法层出不穷,但该领域的进展因评估实践不一致而受阻:研究依赖过度简化的合成数据生成器、采用不兼容的评估指标,且超参数选择缺乏透明度,使得公平比较难以实现。我们提出了一种新型基准测试框架来解决这一问题,该框架包含三方面贡献:(1)一种通过蒙特卡洛试验向真实数据集注入可控分布变化的漂移模拟方法,在保留真实数据复杂性的同时实现监督式评估;(2)一套具备时序感知准则的漂移检测评估协议,包括推导可在多数据流间比较的新指标(如F1检测得分、归一化检测时间);(3)我们倡导针对漂移检测方法采用留一数据集超参数优化协议,以促进方法配置在异构数据流动态中的鲁棒性。我们在4类漂移类型(先验类别漂移、标签互换、特征排列、特征过滤)下,对7个真实数据集上的14种常用漂移检测方法进行了基准测试,每类漂移均包含突变和渐变两种模式。实验结果揭示了当前漂移检测方法的优势与局限性,并为该领域的未来研究建立了基线性能指标。所有代码与实验均已公开。