Data mesh is an emerging domain-driven decentralized data architecture that aims to minimize or avoid operational bottlenecks associated with centralized, monolithic data architectures in enterprises. The topic has picked the practitioners' interest, and there is considerable gray literature on it. At the same time, we observe a lack of academic attempts at defining and building upon the concept. Hence, in this article, we aim to start from the foundations and characterize the data mesh architecture regarding its design principles, architectural components, capabilities, and organizational roles. We systematically collected, analyzed, and synthesized 114 industrial gray literature articles. The review provides insights into practitioners' perspectives on the four key principles of data mesh: data as a product, domain ownership of data, self-serve data platform, and federated computational governance. Moreover, due to the comparability of data mesh and SOA (service-oriented architecture), we mapped the findings from the gray literature into the reference architectures from the SOA academic literature to create the reference architectures for describing three key dimensions of data mesh: organization of capabilities and roles, development, and runtime. Finally, we discuss open research issues in data mesh, partially based on the findings from the gray literature.
翻译:数据网格是一种新兴的领域驱动型去中心化数据架构,旨在最小化或避免企业中集中式、单体式数据架构相关的操作瓶颈。该主题引起了从业者的浓厚兴趣,并积累了相当数量的灰色文献。同时,我们观察到学术界在定义和发展该概念方面缺乏尝试。因此,本文试图从基础出发,从设计原则、架构组件、能力及组织角色等方面对数据网格架构进行特征化描述。我们系统性地收集、分析并综合了114篇工业灰色文献文章。该综述为从业者对数据网格四项关键原则(数据即产品、领域数据所有权、自助式数据平台及联邦计算治理)的视角提供了深入见解。此外,鉴于数据网格与面向服务架构(SOA)的可比性,我们将灰色文献的研究发现映射至SOA学术文献的参考架构,从而构建描述数据网格三个关键维度(能力与角色组织、开发、运行时)的参考架构。最后,我们部分基于灰色文献的研究发现,讨论了数据网格中的开放研究问题。