The role of uncertainty in data management has become more prominent than ever before, especially because of the growing importance of machine learning-driven applications that produce large uncertain databases. A well-known approach to querying such databases is to blend rule-based reasoning with uncertainty. However, techniques proposed so far struggle with large databases. In this paper, we address this problem by presenting a new technique for probabilistic reasoning that exploits Trigger Graphs (TGs) -- a notion recently introduced for the non-probabilistic setting. The intuition is that TGs can effectively store a probabilistic model by avoiding an explicit materialization of the lineage and by grouping together similar derivations of the same fact. Firstly, we show how TGs can be adapted to support the possible world semantics. Then, we describe techniques for efficiently computing a probabilistic model, and formally establish the correctness of our approach. We also present an extensive empirical evaluation using a prototype called LTGs. Our comparison against other leading engines shows that LTGs is not only faster, even against approximate reasoning techniques, but can also reason over probabilistic databases that existing engines cannot scale to.
翻译:数据管理中不确定性的作用比以往任何时候都更加突出,尤其是由于机器学习驱动应用日益重要,这些应用产生大量不确定数据库。查询此类数据库的一种著名方法是将基于规则的推理与不确定性相结合。然而,目前提出的技术在处理大型数据库时面临困难。在本文中,我们通过提出一种利用触发图(TGs)的概率推理新技术来解决这一问题——触发图是最近在非概率设置中引入的概念。其直觉是,TGs可以通过避免明确物化谱系并将相同事实的相似推导分组,从而有效存储概率模型。首先,我们展示了如何调整TGs以支持可能世界语义。然后,我们描述了高效计算概率模型的技术,并正式确立了方法的正确性。我们还使用名为LTGs的原型进行了广泛的实证评估。与其他领先引擎的比较表明,LTGs不仅速度更快(即使与近似推理技术相比),而且能够推理现有引擎无法扩展的概率数据库。