The growing use of large machine learning models highlights concerns about their increasing computational demands. While the energy consumption of their training phase has received attention, fewer works have considered the inference phase. For ML inference, the binding of ML models to the ML system for user access, known as ML serving, is a critical yet understudied step for achieving efficiency in ML applications. We examine the literature in ML architectural design decisions and Green AI, with a special focus on ML serving. The aim is to analyze ML serving architectural design decisions for the purpose of understanding and identifying them with respect to quality characteristics from the point of view of researchers and practitioners in the context of ML serving literature. Our results (i) identify ML serving architectural design decisions along with their corresponding components and associated technological stack, and (ii) provide an overview of the quality characteristics studied in the literature, including energy efficiency. This preliminary study is the first step in our goal to achieve green ML serving. Our analysis may aid ML researchers and practitioners in making green-aware architecture design decisions when serving their models.
翻译:大规模机器学习模型的广泛应用凸显了其日益增长的算力需求。尽管模型训练阶段的能耗已受到关注,但针对推理阶段的研究仍相对较少。在机器学习推理场景中,将模型绑定至供用户访问的机器学习系统(即机器学习服务)是提升应用效率的关键环节,但这一步骤尚未得到充分研究。本文系统梳理了机器学习架构设计决策与绿色人工智能领域的文献,特别聚焦于机器学习服务。旨在通过分析机器学习服务文献中研究者与实践者的视角,以理解并识别影响质量特征的架构设计决策。研究结果:(i)识别出机器学习服务的架构设计决策及其对应的组件与技术栈;(ii)概述了文献中涉及的质量特征(包括能效性)。本基础性研究是迈向绿色机器学习服务目标的第一步,其分析结果可帮助机器学习研究者与实践者在服务部署阶段做出绿色感知的架构设计决策。