Query2GMM: Learning Representation with Gaussian Mixture Model for Reasoning over Knowledge Graphs

Logical query answering over Knowledge Graphs (KGs) is a fundamental yet complex task. A promising approach to achieve this is to embed queries and entities jointly into the same embedding space. Research along this line suggests that using multi-modal distribution to represent answer entities is more suitable than uni-modal distribution, as a single query may contain multiple disjoint answer subsets due to the compositional nature of multi-hop queries and the varying latent semantics of relations. However, existing methods based on multi-modal distribution roughly represent each subset without capturing its accurate cardinality, or even degenerate into uni-modal distribution learning during the reasoning process due to the lack of an effective similarity measure. To better model queries with diversified answers, we propose Query2GMM for answering logical queries over knowledge graphs. In Query2GMM, we present the GMM embedding to represent each query using a univariate Gaussian Mixture Model (GMM). Each subset of a query is encoded by its cardinality, semantic center and dispersion degree, allowing for precise representation of multiple subsets. Then we design specific neural networks for each operator to handle the inherent complexity that comes with multi-modal distribution while alleviating the cascading errors. Last, we define a new similarity measure to assess the relationships between an entity and a query's multi-answer subsets, enabling effective multi-modal distribution learning for reasoning. Comprehensive experimental results show that Query2GMM outperforms the best competitor by an absolute average of $5.5\%$. The source code is available at \url{https://anonymous.4open.science/r/Query2GMM-C42F}.

翻译：知识图谱上的逻辑查询回答是一个基础但复杂的任务。一种有前景的方法是将查询和实体联合嵌入到同一表示空间。沿此方向的研究表明，使用多模态分布表示答案实体比单模态分布更合适，因为单次查询可能包含多个不相交的答案子集——这源于多跳查询的组合性质以及关系的潜在语义差异。然而，现有的多模态分布方法要么粗略地表示每个子集而未捕捉其准确基数，要么因缺乏有效相似度度量而在推理过程中退化为单模态分布学习。为更好地建模具有多样化答案的查询，我们提出Query2GMM用于知识图谱逻辑查询回答。在Query2GMM中，我们提出GMM嵌入，通过单变量高斯混合模型表示每个查询。查询的每个子集由其基数、语义中心和离散程度编码，从而精确表示多个子集。随后，我们为每个操作符设计专用神经网络，以处理多模态分布固有的复杂性，同时缓解级联误差。最后，我们定义了一种新的相似度度量来评估实体与查询多答案子集之间的关系，实现用于推理的有效多模态分布学习。综合实验结果表明，Query2GMM比最优竞争对手平均绝对性能提升$5.5\%$。源代码可从\url{https://anonymous.4open.science/r/Query2GMM-C42F}获取。