The problem of answering logical queries over incomplete knowledge graphs is receiving significant attention in the machine learning community. Neuro-symbolic models are a promising recent approach, showing good performance and allowing for good interpretability properties. These models rely on trained architectures to execute atomic queries, combining them with modules that simulate the symbolic operators in queries. Unfortunately, most neuro-symbolic query processors are limited to the so-called tree-like logical queries that admit a bottom-up execution, where the leaves are constant values or anchors, and the root is the target variable. Tree-like queries, while expressive, fail short to express properties in knowledge graphs that are important in practice, such as the existence of multiple edges between entities or the presence of triangles. We propose a framework for answering arbitrary conjunctive queries over incomplete knowledge graphs. The main idea of our method is to approximate a cyclic query by an infinite family of tree-like queries, and then leverage existing models for the latter. Our approximations achieve strong guarantees: they are complete, i.e. there are no false negatives, and optimal, i.e. they provide the best possible approximation using tree-like queries. Our method requires the approximations to be tree-like queries where the leaves are anchors or existentially quantified variables. Hence, we also show how some of the existing neuro-symbolic models can handle these queries, which is of independent interest. Experiments show that our approximation strategy achieves competitive results, and that including queries with existentially quantified variables tends to improve the general performance of these models, both on tree-like queries and on our approximation strategy.
翻译:在不完全知识图谱上回答逻辑查询的问题正受到机器学习界的广泛关注。神经符号模型是近期一种有前景的方法,展现出良好的性能并具备较好的可解释性。这些模型依赖训练好的架构来执行原子查询,并将这些查询与模拟查询中符号运算符的模块相结合。遗憾的是,大多数神经符号查询处理器仅限于所谓的树状逻辑查询,这类查询允许自底向上执行,其中叶子节点为常量值或锚点,根节点为目标变量。树状查询虽然具有表达能力,但无法表达知识图谱中实际重要的一些属性,例如实体间存在多重边或三角形结构。我们提出了一种用于在不完全知识图谱上回答任意合取查询的框架。该方法的核心思想是将循环查询近似为无穷多个树状查询族,进而利用针对后者的现有模型。我们的近似方法具有强保证:即完备性(无假阴性)与最优性(通过树状查询实现最佳近似)。该方法要求近似结果为树状查询,其中叶子节点为锚点或存在量化变量。因此,我们还展示了部分现有神经符号模型如何处理这类查询——这本身具有独立研究价值。实验表明,我们的近似策略取得了有竞争力的结果,并且引入包含存在量化变量的查询有助于提升这些模型在树状查询及我们近似策略上的整体性能。