WikiKG90Mv2 in NeurIPS 2022 is a large encyclopedic knowledge graph. Embedding knowledge graphs into continuous vector spaces is important for many practical applications, such as knowledge acquisition, question answering, and recommendation systems. Compared to existing knowledge graphs, WikiKG90Mv2 is a large scale knowledge graph, which is composed of more than 90 millions of entities. Both efficiency and accuracy should be considered when building graph embedding models for knowledge graph at scale. To this end, we follow the retrieve then re-rank pipeline, and make novel modifications in both retrieval and re-ranking stage. Specifically, we propose a priority infilling retrieval model to obtain candidates that are structurally and semantically similar. Then we propose an ensemble based re-ranking model with neighbor enhanced representations to produce final link prediction results among retrieved candidates. Experimental results show that our proposed method outperforms existing baseline methods and improves MRR of validation set from 0.2342 to 0.2839.
翻译:WikiKG90Mv2是NeurIPS 2022中发布的一个大型百科知识图谱。将知识图谱嵌入连续向量空间对于知识获取、问答系统和推荐系统等实际应用至关重要。相较于现有知识图谱,WikiKG90Mv2是规模庞大的知识图谱,包含超过9000万个实体。在构建大规模知识图谱的图嵌入模型时,需要兼顾效率与准确性。为此,我们遵循"先检索后重排序"的流水线,并在检索与重排序阶段均作出创新性改进。具体而言,我们提出了一种优先级填充检索模型,用于获取结构相似与语义相似的候选实体。随后,我们提出了一种基于集成的重排序模型,该模型采用邻域增强表示,对检索得到的候选实体进行最终链接预测。实验结果表明,我们所提出的方法优于现有基线方法,并将验证集的MRR指标从0.2342提升至0.2839。