Predictive models often face performance degradation due to evolving data distributions, a phenomenon known as data drift. Among its forms, concept drift, where the relationship between explanatory variables and the response variable changes, is particularly challenging to detect and adapt to. Traditional drift detection methods often rely on metrics such as accuracy or variable distributions, which may fail to capture subtle but significant conceptual changes. This paper introduces drifter, an R package designed to detect concept drift, and proposes a novel method called Profile Drift Detection (PDD) that enables both drift detection and an enhanced understanding of the cause behind the drift by leveraging an explainable AI tool - Partial Dependence Profiles (PDPs). The PDD method, central to the package, quantifies changes in PDPs through novel metrics, ensuring sensitivity to shifts in the data stream without excessive computational costs. This approach aligns with MLOps practices, emphasizing model monitoring and adaptive retraining in dynamic environments. The experiments across synthetic and real-world datasets demonstrate that PDD outperforms existing methods by maintaining high accuracy while effectively balancing sensitivity and stability. The results highlight its capability to adaptively retrain models in dynamic environments, making it a robust tool for real-time applications. The paper concludes by discussing the advantages, limitations, and future extensions of the package for broader use cases.
翻译:预测模型常因数据分布随时间演变而面临性能退化,这一现象被称为数据漂移。其中,解释变量与响应变量之间关系发生变化的概念漂移尤其难以检测和适应。传统的漂移检测方法通常依赖准确率或变量分布等指标,这些指标可能无法捕捉到细微但重要的概念性变化。本文介绍了drifter——一个专为检测概念漂移设计的R语言包,并提出了一种名为剖面漂移检测(PDD)的新方法。该方法通过利用可解释人工智能工具——部分依赖剖面(PDPs),不仅能实现漂移检测,还能增强对漂移背后原因的理解。作为该包核心的PDD方法通过新颖的度量指标量化PDPs的变化,在保证对数据流变化敏感性的同时避免了过高的计算成本。这一方法符合MLOps实践,强调在动态环境中的模型监控与自适应重训练。在合成数据集和真实数据集上的实验表明,PDD在保持高准确率的同时有效平衡了敏感性与稳定性,其性能优于现有方法。结果突显了该方法在动态环境中自适应重训练模型的能力,使其成为实时应用的强大工具。文章最后讨论了该包在更广泛用例中的优势、局限性与未来扩展方向。