Organizations rely on machine learning engineers (MLEs) to deploy models and maintain ML pipelines in production. Due to models' extensive reliance on fresh data, the operationalization of machine learning, or MLOps, requires MLEs to have proficiency in data science and engineering. When considered holistically, the job seems staggering -- how do MLEs do MLOps, and what are their unaddressed challenges? To address these questions, we conducted semi-structured ethnographic interviews with 18 MLEs working on various applications, including chatbots, autonomous vehicles, and finance. We find that MLEs engage in a workflow of (i) data preparation, (ii) experimentation, (iii) evaluation throughout a multi-staged deployment, and (iv) continual monitoring and response. Throughout this workflow, MLEs collaborate extensively with data scientists, product stakeholders, and one another, supplementing routine verbal exchanges with communication tools ranging from Slack to organization-wide ticketing and reporting systems. We introduce the 3Vs of MLOps: velocity, visibility, and versioning -- three virtues of successful ML deployments that MLEs learn to balance and grow as they mature. Finally, we discuss design implications and opportunities for future work.
翻译:组织依赖机器学习工程师(MLEs)部署模型并维护生产环境中的ML管道。由于模型对新鲜数据的广泛依赖,机器学习操作化(MLOps)要求MLEs同时精通数据科学与工程。从整体来看,这项任务似乎令人望而生畏——MLEs如何实施MLOps?他们面临哪些未解决的挑战?为解答这些问题,我们对18位从事不同应用(包括聊天机器人、自动驾驶汽车和金融)的MLEs进行了半结构化民族志访谈。研究发现MLEs遵循以下工作流程:(i)数据准备,(ii)实验,(iii)多阶段部署中的持续评估,以及(iv)持续监控与响应。在整个流程中,MLEs与数据科学家、产品利益相关者及其他工程师广泛协作,除常规口头交流外,还使用从Slack到组织级工单与报告系统等通信工具。我们提出MLOps的3V原则:速度(velocity)、可见性(visibility)和版本管理(versioning)——这是MLEs在成熟过程中需要学习平衡和培养的三大成功ML部署要素。最后,我们讨论设计启示与未来研究方向。