Factorized Databases (FDBs) and the recently introduced Path Multiset Representations (PMRs) both aim at compactly representing results of database queries, and are quite different at first sight. FDBs were developed for the relational database model and represent finite sets of tuples, all of which have the same length. PMRs, on the other hand, were developed for the graph database model and represent possibly infinite multisets of variable-length paths. In this paper, we connect both representations to a common framework that is rooted in formal language theory. In particular, we show why FDBs are a special case of context-free grammars, which allows us to generalize FDBs beyond the standard setting of database relations. Taking into account that PMRs and finite automata are closely connected, this opens up a wide range of questions about tradeoffs between their respective size and the efficiency of query-plan operations on automata/grammar based representations. As a first step, we present here first results on size trade-offs between fundamental variants of automata-based and grammar-based compact representations.
翻译:因子化数据库(FDBs)与新近提出的路径多重集表示(PMRs)均旨在紧凑地表示数据库查询结果,且两者初看之下截然不同。FDBs针对关系数据库模型开发,表示长度相同的有限元组集合。而PMRs则针对图数据库模型开发,表示长度可变的无限路径多重集。本文中,我们将两种表示关联至一个植根于形式语言理论的共同框架。具体而言,我们展示了FDBs为何是上下文无关文法的一种特例,从而使FDBs能够超越数据库关系的标准设置进行推广。考虑到PMRs与有限自动机紧密相关,这为探究二者自身大小与基于自动机/文法的表示上查询计划操作效率间的权衡问题开辟了广阔空间。作为第一步,我们在此呈现了基于自动机与基于文法的紧凑表示基本变体之间大小权衡的首批研究成果。