The core reasoning task for datalog engines is materialization, the evaluation of a datalog program over a database alongside its physical incorporation into the database itself. The de-facto method of computing it, is through the recursive application of inference rules. Due to it being a costly operation, it is a must for datalog engines to provide incremental materialization, that is, to adjust the computation to new data, instead of restarting from scratch. One of the major caveats, is that deleting data is notoriously more involved than adding, since one has to take into account all possible data that has been entailed from what is being deleted. Differential Dataflow is a computational model that provides efficient incremental maintenance, notoriously with equal performance between additions and deletions, and work distribution, of iterative dataflows. In this paper we investigate the performance of materialization with three reference datalog implementations, out of which one is built on top of a lightweight relational engine, and the two others are differential-dataflow and non-differential versions of the same rewrite algorithm, with the same optimizations.
翻译:Datalog引擎的核心推理任务是物化,即对数据库上的Datalog程序进行评估,并将其结果物理集成到数据库中。其计算的事实标准是通过递归应用推理规则实现的。由于这是一项高成本操作,Datalog引擎必须提供增量物化能力,即根据新数据调整计算,而非从头开始重新计算。一个主要难点在于,删除数据比添加数据复杂得多,因为必须考虑被删除数据所蕴含的所有可能派生结果。差分数据流是一种计算模型,能够为迭代数据流提供高效的增量维护,尤其能在添加和删除操作之间实现同等性能,并支持工作负载分布。本文通过三种参考Datalog实现来评估物化性能:其中一种基于轻量级关系引擎构建,另外两种是同一重写算法(采用相同优化策略)的差分数据流版本和非差分数据流版本。