Machine Learning Operations (MLOps) is becoming a highly crucial part of businesses looking to capitalize on the benefits of AI and ML models. This research presents a detailed review of MLOps, its benefits, difficulties, evolutions, and important underlying technologies such as MLOps frameworks, Docker, GitHub actions, and Kubernetes. The MLOps workflow, which includes model design, deployment, and operations, is explained in detail along with the various tools necessary for both model and data exploration and deployment. This article also puts light on the end-to-end production of ML projects using various maturity levels of automated pipelines, with the least at no automation at all and the highest with complete CI/CD and CT capabilities. Furthermore, a detailed example of an enterprise-level MLOps project for an object detection service is used to explain the workflow of the technology in a real-world scenario. For this purpose, a web application hosting a pre-trained model from TensorFlow 2 Model Zoo is packaged and deployed to the internet making sure that the system is scalable, reliable, and optimized for deployment at an enterprise level.
翻译:机器学习运维正成为企业希望充分利用AI和ML模型优势的关键环节。本研究对MLOps进行了详细综述,涵盖其优势、挑战、演进历程及重要底层技术,如MLOps框架、 Docker、GitHub Actions和Kubernetes。本文详细阐释了包含模型设计、部署和运维的MLOps工作流程,以及模型与数据探索、部署所需的各类工具。同时,本文聚焦于利用不同成熟度的自动化流水线实现ML项目的端到端生产,其成熟度从最低的无自动化到最高的完整CI/CD与CT能力。此外,本文通过一个企业级目标检测服务MLOps项目的详细案例,解释了该技术在真实场景中的工作流程。为此,项目将一个托管来自TensorFlow 2 Model Zoo的预训练模型的Web应用进行打包并部署至互联网,确保系统具备可扩展性、可靠性,并针对企业级部署进行了优化。