Applying DevOps practices to machine learning system is termed as MLOps and machine learning systems evolve on new data unlike traditional systems on requirements. The objective of MLOps is to establish a connection between different open-source tools to construct a pipeline that can automatically perform steps to construct a dataset, train the machine learning model and deploy the model to the production as well as store different versions of model and dataset. Benefits of MLOps is to make sure the fast delivery of the new trained models to the production to have accurate results. Furthermore, MLOps practice impacts the overall quality of the software products and is completely dependent on open-source tools and selection of relevant open-source tools is considered as challenged while a generalized method to select an appropriate open-source tools is desirable. In this paper, we present a framework for recommendation system that processes the contextual information (e.g., nature of data, type of the data) of the machine learning project and recommends a relevant toolchain (tech-stack) for the operationalization of machine learning systems. To check the applicability of the proposed framework, four different approaches i.e., rule-based, random forest, decision trees and k-nearest neighbors were investigated where precision, recall and f-score is measured, the random forest out classed other approaches with highest f-score value of 0.66.
翻译:将DevOps实践应用于机器学习系统称为MLOps,与传统系统随需求演变不同,机器学习系统随新数据演进。MLOps的目标是连接不同开源工具构建自动化流水线,完成数据集构建、模型训练、模型部署至生产环境以及存储不同版本的模型与数据集。MLOps的优势在于确保新训练模型快速交付至生产环境以获取准确结果。此外,MLOps实践影响软件产品的整体质量,且完全依赖于开源工具,而选择相关开源工具被视为挑战,因此亟需一种选择合适开源工具的通用方法。本文提出一个推荐系统框架,该框架处理机器学习项目的上下文信息(如数据性质、数据类型),并推荐相关工具链(技术栈)用于机器学习系统的运维。为检验所提框架的适用性,研究了四种不同方法(基于规则、随机森林、决策树与K近邻),测量了精确率、召回率与F值。其中随机森林以最高0.66的F值优于其他方法。