Data mesh is an emerging domain-driven decentralized data architecture that aims to minimize or avoid operational bottlenecks associated with centralized, monolithic data architectures in enterprises. The topic has picked the practitioners' interest, and there is considerable gray literature on it. At the same time, we observe a lack of academic attempts at defining and building upon the concept. Hence, in this article, we aim to start from the foundations and characterize the data mesh architecture regarding its design principles, architectural components, capabilities, and organizational roles. We systematically collected, analyzed, and synthesized 114 industrial gray literature articles. The review provides insights into practitioners' perspectives on the four key principles of data mesh: data as a product, domain ownership of data, self-serve data platform, and federated computational governance. Moreover, due to the comparability of data mesh and SOA (service-oriented architecture), we mapped the findings from the gray literature into the reference architectures from the SOA academic literature to create the reference architectures for describing three key dimensions of data mesh: organization of capabilities and roles, development, and runtime. Finally, we discuss open research issues in data mesh, partially based on the findings from the gray literature.
翻译:数据网格是一种新兴的、领域驱动的去中心化数据架构,旨在最大限度地减少或避免企业中与集中式单体数据架构相关的运营瓶颈。该主题已引起从业者的广泛兴趣,并产生了大量相关的灰色文献。与此同时,我们观察到学术界在定义和构建这一概念方面尚缺乏尝试。因此,在本文中,我们旨在从基础出发,从设计原则、架构组件、能力及组织角色等方面对数据网格架构进行特征描述。我们系统性地收集、分析并综合了114篇工业灰色文献。本综述提供了从业者对数据网格四大核心原则的见解:数据即产品、数据的领域所有权、自助式数据平台以及联邦计算治理。此外,鉴于数据网格与面向服务架构(SOA)的可比性,我们将灰色文献中的发现映射到SOA学术文献中的参考架构,从而创建了用于描述数据网格三个关键维度的参考架构:能力与角色组织、开发以及运行时。最后,我们基于灰色文献的部分发现,探讨了数据网格中尚未解决的研究问题。