Deployed machine learning systems face distribution drift, yet most monitoring pipelines stop at alarms and leave the response underspecified under labeling, compute, and latency constraints. We introduce Drift2Act, a drift-to-action controller that treats monitoring as constrained decision-making with explicit safety. Drift2Act combines a sensing layer that maps unlabeled monitoring signals to a belief over drift types with an active risk certificate that queries a small set of delayed labels from a recent window to produce an anytime-valid upper bound $U_t(δ)$ on current risk. The certificate gates operation: if $U_t(δ) \le τ$, the controller selects low-cost actions (e.g., recalibration or test-time adaptation); if $U_t(δ) > τ$, it activates abstain/handoff and escalates to rollback or retraining under cooldowns. In a realistic streaming protocol with label delay and explicit intervention costs, Drift2Act achieves near-zero safety violations and fast recovery at moderate cost on WILDS Camelyon17, DomainNet, and a controlled synthetic drift stream, outperforming alarm-only monitoring, adapt-always adaptation, schedule-based retraining, selective prediction alone, and an ablation without certification. Overall, online risk certification enables reliable drift response and reframes monitoring as decision-making with safety.
翻译:已部署的机器学习系统面临分布漂移,然而大多数监控流程仅止于发出警报,且在标注、计算和延迟约束下,其响应措施往往未明确指定。我们提出Drift2Act,一种漂移至行动控制器,它将监控视为具有明确安全性的约束决策问题。Drift2Act结合了一个感知层——将未标注的监控信号映射到关于漂移类型的信念,以及一个主动风险证书——从最近的时间窗口查询少量延迟标注,以生成当前风险的一个任意时间有效的上界$U_t(δ)$。该证书控制操作:若$U_t(δ) \le τ$,控制器选择低成本行动(例如重新校准或测试时适应);若$U_t(δ) > τ$,则启动弃权/移交机制,并在冷却期下升级至回滚或重新训练。在一个具有标注延迟和明确干预成本的现实流式协议中,Drift2Act在WILDS Camelyon17、DomainNet以及一个受控的合成漂移流上,以适中成本实现了近乎零的安全违规和快速恢复,其性能优于仅报警监控、始终适应策略、基于计划的重新训练、单独的选择性预测以及一个无证书的消融实验。总体而言,在线风险认证实现了可靠的漂移响应,并将监控重新定义为具有安全性的决策过程。