Factorized representations (FRs) are a well-known tool to succinctly represent results of join queries and have been originally defined using the named database perspective. We define FRs in the unnamed database perspective and use them to establish several new connections. First, unnamed FRs can be exponentially more succinct than named FRs, but this difference can be alleviated by imposing a disjointness condition on columns. Conversely, named FRs can also be exponentially more succinct than unnamed FRs. Second, unnamed FRs are the same as (i.e., isomorphic to) context-free grammars for languages in which each word has the same length. This tight connection allows us to transfer a wide range of results on context-free grammars to database factorization; of which we offer a selection in the paper. Third, when we generalize unnamed FRs to arbitrary sets of tuples, they become a generalization of \emph{path multiset representations}, a formalism that was recently introduced to succinctly represent sets of paths in the context of graph database query evaluation.
翻译:因式化表示是一种用于简洁表示连接查询结果的著名工具,最初是在命名数据库视角下定义的。本文在未命名数据库视角下定义因式化表示,并利用它们建立若干新的联系。首先,未命名因式化表示可能比命名因式化表示指数级更简洁,但这种差异可以通过对列施加不相交条件来缓解。反之,命名因式化表示也可能比未命名因式化表示指数级更简洁。其次,未命名因式化表示等同于(即同构于)每个单词长度相同的语言对应的上下文无关文法。这种紧密联系使我们能够将大量关于上下文无关文法的结果迁移到数据库因式化领域;本文选取了其中部分结果进行展示。第三,当我们将未命名因式化表示推广到任意元组集合时,它们成为\emph{路径多重集表示}的泛化形式——该形式化方法最近被提出,用于在图数据库查询评估的背景下简洁表示路径集合。