How to Sustainably Monitor ML-Enabled Systems? Accuracy and Energy Efficiency Tradeoffs in Concept Drift Detection

ML-enabled systems that are deployed in a production environment typically suffer from decaying model prediction quality through concept drift, i.e., a gradual change in the statistical characteristics of a certain real-world domain. To combat this, a simple solution is to periodically retrain ML models, which unfortunately can consume a lot of energy. One recommended tactic to improve energy efficiency is therefore to systematically monitor the level of concept drift and only retrain when it becomes unavoidable. Different methods are available to do this, but we know very little about their concrete impact on the tradeoff between accuracy and energy efficiency, as these methods also consume energy themselves. To address this, we therefore conducted a controlled experiment to study the accuracy vs. energy efficiency tradeoff of seven common methods for concept drift detection. We used five synthetic datasets, each in a version with abrupt and one with gradual drift, and trained six different ML models as base classifiers. Based on a full factorial design, we tested 420 combinations (7 drift detectors * 5 datasets * 2 types of drift * 6 base classifiers) and compared energy consumption and drift detection accuracy. Our results indicate that there are three types of detectors: a) detectors that sacrifice energy efficiency for detection accuracy (KSWIN), b) balanced detectors that consume low to medium energy with good accuracy (HDDM_W, ADWIN), and c) detectors that consume very little energy but are unusable in practice due to very poor accuracy (HDDM_A, PageHinkley, DDM, EDDM). By providing rich evidence for this energy efficiency tactic, our findings support ML practitioners in choosing the best suited method of concept drift detection for their ML-enabled systems.

翻译：部署在生产环境中的ML驱动系统通常会因概念漂移（即真实世界领域中统计特征的渐进变化）而导致模型预测质量下降。为应对这一问题，一种简单方案是定期重新训练ML模型，但这会消耗大量能源。因此，提升能效的推荐策略是系统性地监控概念漂移程度，仅在漂移不可避免时触发重训练。现有多种方法可实现此目标，但我们对这些方法在准确性与能效权衡中的具体影响知之甚少——因为它们本身也会消耗能源。为此，我们开展了一项受控实验，研究七种常见概念漂移检测方法在准确率与能效之间的权衡关系。实验采用五个合成数据集（每个数据集分别包含突变漂移与渐变漂移版本），并训练六种不同ML模型作为基分类器。基于全因子设计，我们测试了420种组合（7种漂移检测器 × 5个数据集 × 2种漂移类型 × 6种基分类器），并对比了能耗与漂移检测准确率。结果表明存在三类检测器：a) 以能效换取检测准确率的检测器（KSWIN）；b) 能耗中低且准确率良好的均衡型检测器（HDDM_W、ADWIN）；c) 能耗极低但因准确率过低而不具实用性的检测器（HDDM_A、PageHinkley、DDM、EDDM）。本研究为这一能效策略提供了丰富证据，可帮助ML从业者为其ML驱动系统选择最合适的概念漂移检测方法。