Long-running machine learning models face the issue of concept drift (CD), whereby the data distribution changes over time, compromising prediction performance. Updating the model requires detecting drift by monitoring the data and/or the model for unexpected changes. We show that, however, spurious correlations (SCs) can spoil the statistics tracked by detection algorithms. Motivated by this, we introduce ebc-exstream, a novel detector that leverages model explanations to identify potential SCs and human feedback to correct for them. It leverages an entropy-based heuristic to reduce the amount of necessary feedback, cutting annotation costs. Our preliminary experiments on artificially confounded data highlight the promise of ebc-exstream for reducing the impact of SCs on detection.
翻译:长期运行的机器学习模型面临概念漂移问题,即数据分布随时间变化,从而损害预测性能。更新模型需要通过监控数据和/或模型以检测意外变化来发现漂移。然而,我们发现虚假相关性可能破坏检测算法所追踪的统计量。受此启发,我们提出了ebc-exstream,一种新颖的检测器,它利用模型解释来识别潜在的虚假相关性,并借助人类反馈对其进行校正。该方法采用基于熵的启发式策略来减少所需反馈量,从而降低标注成本。我们在人工混淆数据上的初步实验表明,ebc-exstream在降低虚假相关性对检测的影响方面具有良好前景。