Relational Database Management Systems designed for Online Analytical Processing (RDBMS-OLAP) have been foundational to democratizing data and enabling analytical use cases such as business intelligence and reporting for many years. However, RDBMS-OLAP systems present some well-known challenges. They are primarily optimized only for relational workloads, lead to proliferation of data copies which can become unmanageable, and since the data is stored in proprietary formats, it can lead to vendor lock-in, restricting access to engines, tools, and capabilities beyond what the vendor offers. As the demand for data-driven decision making surges, the need for a more robust data architecture to address these challenges becomes ever more critical. Cloud data lakes have addressed some of the shortcomings of RDBMS-OLAP systems, but they present their own set of challenges. More recently, organizations have often followed a two-tier architectural approach to take advantage of both these platforms, leveraging both cloud data lakes and RDBMS-OLAP systems. However, this approach brings additional challenges, complexities, and overhead. This paper discusses how a data lakehouse, a new architectural approach, achieves the same benefits of an RDBMS-OLAP and cloud data lake combined, while also providing additional advantages. We take today's data warehousing and break it down into implementation independent components, capabilities, and practices. We then take these aspects and show how a lakehouse architecture satisfies them. Then, we go a step further and discuss what additional capabilities and benefits a lakehouse architecture provides over an RDBMS-OLAP.
翻译:为在线分析处理设计的传统关系数据库管理系统(RDBMS-OLAP)多年来一直是数据民主化的基石,并支持商业智能和报告等分析型用例。然而,RDBMS-OLAP系统存在一些众所周知的挑战。它们主要针对关系型工作负载进行优化,导致数据副本激增而难以管理;同时,由于数据存储在专有格式中,可能导致供应商锁定,限制了访问供应商提供的引擎、工具和功能之外的能力。随着数据驱动决策需求的激增,构建更强大的数据架构以应对这些挑战变得至关重要。云数据湖解决了RDBMS-OLAP系统的一些缺陷,但也带来了自身的一系列挑战。近年来,组织常常采用双层架构方法,以充分利用云数据湖和RDBMS-OLAP系统这两种平台。然而,这种方法带来了额外的挑战、复杂性和开销。本文讨论了数据湖仓这一新型架构方法如何实现RDBMS-OLAP与云数据湖的相同优势,同时提供额外的优点。我们将当今的数据仓库拆解为与实现无关的组件、能力和实践,然后展示湖仓架构如何满足这些需求。此外,我们进一步探讨湖仓架构相较于RDBMS-OLAP所提供的额外能力和优势。