Sequential Query Encoding For Complex Query Answering on Knowledge Graphs

Complex Query Answering (CQA) is an important and fundamental task for knowledge graph (KG) reasoning. Query encoding (QE) is proposed as a fast and robust solution to CQA. In the encoding process, most existing QE methods first parse the logical query into an executable computational direct-acyclic graph (DAG), then use neural networks to parameterize the operators, and finally, recursively execute these neuralized operators. However, the parameterization-and-execution paradigm may be potentially over-complicated, as it can be structurally simplified by a single neural network encoder. Meanwhile, sequence encoders, like LSTM and Transformer, proved to be effective for encoding semantic graphs in related tasks. Motivated by this, we propose sequential query encoding (SQE) as an alternative to encode queries for CQA. Instead of parameterizing and executing the computational graph, SQE first uses a search-based algorithm to linearize the computational graph to a sequence of tokens and then uses a sequence encoder to compute its vector representation. Then this vector representation is used as a query embedding to retrieve answers from the embedding space according to similarity scores. Despite its simplicity, SQE demonstrates state-of-the-art neural query encoding performance on FB15k, FB15k-237, and NELL on an extended benchmark including twenty-nine types of in-distribution queries. Further experiment shows that SQE also demonstrates comparable knowledge inference capability on out-of-distribution queries, whose query types are not observed during the training process.

翻译：复杂查询回答（CQA）是知识图谱（KG）推理中一项重要且基础的任务。查询编码（QE）被提出作为CQA的快速且鲁棒性解决方案。在编码过程中，现有大多数QE方法首先将逻辑查询解析为可执行的计算有向无环图（DAG），然后使用神经网络对操作符进行参数化，最后递归执行这些神经化操作符。然而，这种参数化-执行范式可能过于复杂，因为它在结构上可以通过单一神经网络编码器进行简化。同时，序列编码器（如LSTM和Transformer）已被证明在相关任务中对语义图编码十分有效。受此启发，我们提出将序列化查询编码（SQE）作为CQA查询编码的替代方案。SQE无需对计算图进行参数化和执行，而是首先使用基于搜索的算法将计算图线性化为令牌序列，然后使用序列编码器计算其向量表示。随后，该向量表示作为查询嵌入，根据相似度得分从嵌入空间中检索答案。尽管方法简单，SQE在包含二十九种分布内查询类型的扩展基准测试中，在FB15k、FB15k-237和NELL数据集上展现了最先进的神经查询编码性能。进一步实验表明，SQE在分布外查询（训练过程中未观察到查询类型）上也展现出可比较的知识推理能力。