Knowledge Graph Construction (KGC) can be seen as an iterative process starting from a high quality nucleus that is refined by knowledge extraction approaches in a virtuous loop. Such a nucleus can be obtained from knowledge existing in an open KG like Wikidata. However, due to the size of such generic KGs, integrating them as a whole may entail irrelevant content and scalability issues. We propose an analogy-based approach that starts from seed entities of interest in a generic KG, and keeps or prunes their neighboring entities. We evaluate our approach on Wikidata through two manually labeled datasets that contain either domain-homogeneous or -heterogeneous seed entities. We empirically show that our analogy-based approach outperforms LSTM, Random Forest, SVM, and MLP, with a drastically lower number of parameters. We also evaluate its generalization potential in a transfer learning setting. These results advocate for the further integration of analogy-based inference in tasks related to the KG lifecycle.
翻译:知识图谱构建可视为一个迭代过程,始于高质量核心种子,通过知识抽取方法在良性循环中不断完善。此类核心可从维基数据等开放知识图谱中获取。然而,由于此类通用知识图谱规模庞大,直接整体集成将导致内容冗余与可扩展性问题。我们提出一种基于类比推理的方法:从通用知识图谱中的目标种子实体出发,对其邻接实体进行保留或剪枝处理。通过两个分别包含领域同质与异质种子实体的人工标注数据集,在维基数据上评估了该方法。实验表明,我们的类比方法以极低的参数量优势,显著优于LSTM、随机森林、SVM及MLP模型。我们还通过迁移学习设置验证了其泛化潜力。这些结果论证了将类比推理进一步整合至知识图谱生命周期相关任务中的可行性。