Data mesh is an emerging decentralized approach to managing and generating value from analytical enterprise data at scale. It shifts the ownership of the data to the business domains closest to the data, promotes sharing and managing data as autonomous products, and uses a federated and automated data governance model. The data mesh relies on a managed data platform that offers services to domain and governance teams to build, share, and manage data products efficiently. However, designing and implementing a self-serve data platform is challenging, and the platform engineers and architects must understand and choose the appropriate design options to ensure the platform will enhance the experience of domain and governance teams. For these reasons, this paper proposes a catalog of architectural design decisions and their corresponding decision options by systematically reviewing 43 industrial gray literature articles on self-serve data platforms in data mesh. Moreover, we used semi-structured interviews with six data engineering experts with data mesh experience to validate, refine, and extend the findings from the literature. Such a catalog of design decisions and options drawn from the state of practice shall aid practitioners in building data meshes while providing a baseline for further research on data mesh architectures.
翻译:数据网格是一种新兴的去中心化方法,用于规模化地管理企业分析数据并从中生成价值。它将数据所有权转移至最接近数据的业务领域,促进数据作为自治产品进行共享和管理,并采用联邦式自动化数据治理模型。数据网格依赖于一个托管数据平台,该平台为领域团队和治理团队提供高效构建、共享和管理数据产品的服务。然而,设计和实现自助数据平台充满挑战,平台工程师和架构师必须理解并选择恰当的设计选项,以确保该平台能够提升领域团队和治理团队的体验。为此,本文通过系统性地回顾43篇关于数据网格中自助数据平台的工业灰色文献,提出了一套架构设计决策及其对应决策选项的目录。此外,我们采用半结构化访谈方式,与六位具备数据网格经验的数据工程专家进行交流,以验证、完善并扩展文献中的发现。这一源自实践的设计决策与选项目录,将有助于从业者构建数据网格,同时为数据网格架构的进一步研究提供基准。