Real-world Machine Learning Systems: A survey from a Data-Oriented Architecture Perspective

Machine Learning models are being deployed as parts of real-world systems with the upsurge of interest in artificial intelligence. The design, implementation, and maintenance of such systems are challenged by real-world environments that produce larger amounts of heterogeneous data and users requiring increasingly faster responses with efficient resource consumption. These requirements push prevalent software architectures to the limit when deploying ML-based systems. Data-oriented Architecture (DOA) is an emerging concept that equips systems better for integrating ML models. DOA extends current architectures to create data-driven, loosely coupled, decentralised, open systems. Even though papers on deployed ML-based systems do not mention DOA, their authors made design decisions that implicitly follow DOA. The reasons why, how, and the extent to which DOA is adopted in these systems are unclear. Implicit design decisions limit the practitioners' knowledge of DOA to design ML-based systems in the real world. This paper answers these questions by surveying real-world deployments of ML-based systems. The survey shows the design decisions of the systems and the requirements these satisfy. Based on the survey findings, we also formulate practical advice to facilitate the deployment of ML-based systems. Finally, we outline open challenges to deploying DOA-based systems that integrate ML models.

翻译：随着人工智能兴趣的激增，机器学习模型正被部署为真实世界系统的组成部分。这类系统的设计、实现与维护面临真实环境带来的挑战：这些环境会产生大量异构数据，并要求系统在资源高效消耗的前提下为用户提供日益快速的响应。这些需求在部署基于机器学习的系统时，对主流软件架构构成了极限考验。面向数据架构（DOA）是一种新兴概念，能使系统更好地集成机器学习模型。DOA对现有架构进行扩展，以构建数据驱动、松散耦合、去中心化的开放系统。尽管关于已部署的基于机器学习的系统的论文并未提及DOA，但其作者做出的设计决策隐式遵循了DOA原则。DOA在这些系统中被采纳的原因、方式及程度尚不明确。隐式的设计决策限制了从业人员在真实世界中运用DOA设计基于机器学习的系统的知识。本文通过调研基于机器学习的系统的真实部署情况来回答这些问题。该调研展示了这些系统的设计决策及其满足的需求。基于调研结果，我们进一步制定了实用建议以促进基于机器学习的系统的部署。最后，我们概述了部署集成机器学习模型的基于DOA的系统所面临的开放挑战。

相关内容

Machine Learning

关注 2251

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日