We propose a novel database model whose basic structure is a labeled, directed, acyclic graph with a single root, in which the nodes represent the data sets of an application and the edges represent functional relationships among the data sets. We call such a graph an application context or simply context. The query language of a context consists of two types of queries, traversal queries and analytic queries. Both types of queries are defined using a simple functional algebra whose operations are functional restriction, composition of functions, pairing of functions and Cartesian product of sets. Roughly speaking, traversal queries parallel relational algebra queries, whereas analytic queries parallel SQL Group-by queries. In other words, in our model, traversal queries and analytic queries, are both defined within the same formal framework - in contrast to the relational model, where analytic queries are defined outside the relational algebra. Therefore a distinctive feature of our model is that it supports data management and data analytics within the same formal framework. We demonstrate the expressive power of our model by showing: (a) how a relational database can be defined as a view over a context, with the context playing the role of an underlying semantic layer; (b) how an analytic query over a context can be rewritten at two orthogonal levels: at the level of the traversal queries that do the grouping and measuring, and at the level of the analytic query itself; and (c) how a context can be used as a user-friendly interface for querying relations and analysing relational data.
翻译:我们提出了一种新型数据库模型,其基本结构是一个带标签、有向、具有单根节点的无环图,其中节点表示应用程序的数据集,边表示数据集之间的函数关系。我们将此类图称为应用程序上下文或简称为上下文。上下文的查询语言包含两种查询类型:遍历查询与分析查询。这两类查询均通过简单的函数代数定义,其运算包括函数限制、函数复合、函数配对及集合笛卡尔积。粗略而言,遍历查询类似于关系代数查询,而分析查询则类似于SQL分组查询。换言之,在我们的模型中,遍历查询与分析查询均在相同的形式化框架内定义——这与关系模型形成对比,后者的分析查询定义在关系代数之外。因此,我们模型的一个显著特征是能够在同一形式化框架内支持数据管理与数据分析。我们通过以下方面展示模型的表达能力:(a) 如何将关系数据库定义为上下文的视图,使上下文充当底层语义层的角色;(b) 如何从两个正交层级重写上下文上的分析查询:执行分组与度量的遍历查询层级,以及分析查询自身的层级;(c) 如何将上下文用作查询关系和分析关系数据的用户友好界面。