In modern ML Ops environments, model deployment is a critical process that traditionally relies on static heuristics such as validation error comparisons and A/B testing. However, these methods require human intervention to adapt to real-world deployment challenges, such as model drift or unexpected performance degradation. We investigate whether reinforcement learning, specifically multi-armed bandit (MAB) algorithms, can dynamically manage model deployment decisions more effectively. Our approach enables more adaptive production environments by continuously evaluating deployed models and rolling back underperforming ones in real-time. We test six model selection strategies across two real-world datasets and find that RL based approaches match or exceed traditional methods in performance. Our findings suggest that reinforcement learning (RL)-based model management can improve automation, reduce reliance on manual interventions, and mitigate risks associated with post-deployment model failures.
翻译:在现代ML Ops环境中,模型部署是一个关键流程,传统上依赖于静态启发式方法,例如验证误差比较和A/B测试。然而,这些方法需要人工干预以适应实际部署中的挑战,如模型漂移或意外性能下降。本研究探讨强化学习——特别是多臂老虎机算法——是否能更有效地动态管理模型部署决策。我们的方法通过持续评估已部署模型并实时回退表现不佳的模型,实现了更具适应性的生产环境。我们在两个真实数据集上测试了六种模型选择策略,发现基于强化学习的方法在性能上达到或超越了传统方法。研究结果表明,基于强化学习的模型管理能够提升自动化水平,减少对人工干预的依赖,并降低部署后模型故障相关的风险。