Datalog is a powerful yet elegant language that allows expressing recursive computation. Although Datalog evaluation has been extensively studied in the literature, so far, only loose upper bounds are known on how fast a Datalog program can be evaluated. In this work, we ask the following question: given a Datalog program over a naturally-ordered semiring $\sigma$, what is the tightest possible runtime? To this end, our main contribution is a general two-phase framework for analyzing the data complexity of Datalog over $\sigma$: first ground the program into an equivalent system of polynomial equations (i.e. grounding) and then find the least fixpoint of the grounding over $\sigma$. We present algorithms that use structure-aware query evaluation techniques to obtain the smallest possible groundings. Next, efficient algorithms for fixpoint evaluation are introduced over two classes of semirings: (1) finite-rank semirings and (2) absorptive semirings of total order. Combining both phases, we obtain state-of-the-art and new algorithmic results. Finally, we complement our results with a matching fine-grained lower bound.
翻译:Datalog是一种强大而优雅的语言,能够表达递归计算。尽管文献中已对Datalog评估进行了广泛研究,但迄今关于Datalog程序可被评估的最快速度仅存在松散的上界。本研究提出以下问题:给定一个定义在自然序半环σ上的Datalog程序,其理论上最严格的时间复杂度是多少?为此,我们的主要贡献是建立了一个通用的两阶段分析框架,用于分析Datalog在σ上的数据复杂度:首先将程序基础化为等价的多元多项式方程组(即基础化),然后寻找该基础化在σ上的最小不动点。我们提出了采用结构感知查询评估技术来获得尽可能小基础化的算法。接着针对两类半环引入了高效的不动点评估算法:(1)有限秩半环和(2)全序吸收半环。通过两阶段的结合,我们获得了当前最优及全新的算法结果。最后,我们通过匹配的细粒度下界对结果进行了补充。