ML models are increasingly being pushed to mobile devices, for low-latency inference and offline operation. However, once the models are deployed, it is hard for ML operators to track their accuracy, which can degrade unpredictably (e.g., due to data drift). We design the first end-to-end system for continuously monitoring and adapting models on mobile devices without requiring feedback from users. Our key observation is that often model degradation is due to a specific root cause, which may affect a large group of devices. Therefore, once the system detects a consistent degradation across a large number of devices, it employs a root cause analysis to determine the origin of the problem and applies a cause-specific adaptation. We evaluate the system on two computer vision datasets, and show it consistently boosts accuracy compared to existing approaches. On a dataset containing photos collected from driving cars, our system improves the accuracy on average by 15%.
翻译:机器学习模型正越来越多地部署到移动设备上,以实现低延迟推理和离线操作。然而,模型一旦部署,运维人员便难以追踪其准确率,而准确率可能因数据漂移等原因出现不可预见的下降。我们设计了首个端到端系统,用于在无需用户反馈的情况下持续监控和适配移动设备上的模型。我们的关键观察是,模型性能下降往往源于特定的根本原因,且可能影响大量设备。因此,一旦系统检测到大量设备出现一致的性能下降,便会执行根本原因分析以确定问题源头,并应用针对性的适配方案。我们在两个计算机视觉数据集上评估了该系统,结果表明其准确率始终优于现有方法。在一个包含行车记录仪采集照片的数据集上,我们的系统将平均准确率提升了15%。