Injecting textual information into knowledge graph (KG) entity representations has been a worthwhile expedition in terms of improving performance in KG oriented tasks within the NLP community. External knowledge often adopted to enhance KG embeddings ranges from semantically rich lexical dependency parsed features to a set of relevant key words to entire text descriptions supplied from an external corpus such as wikipedia and many more. Despite the gains this innovation (Text-enhanced KG embeddings) has made, the proposal in this work suggests that it can be improved even further. Instead of using a single text description (which would not sufficiently represent an entity because of the inherent lexical ambiguity of text), we propose a multi-task framework that jointly selects a set of text descriptions relevant to KG entities as well as align or augment KG embeddings with text descriptions. Different from prior work that plugs formal entity descriptions declared in knowledge bases, this framework leverages a retriever model to selectively identify richer or highly relevant text descriptions to use in augmenting entities. Furthermore, the framework treats the number of descriptions to use in augmentation process as a parameter, which allows the flexibility of enumerating across several numbers before identifying an appropriate number. Experiment results for Link Prediction demonstrate a 5.5% and 3.5% percentage increase in the Mean Reciprocal Rank (MRR) and Hits@10 scores respectively, in comparison to text-enhanced knowledge graph augmentation methods using traditional CNNs.
翻译:将文本信息注入知识图谱实体表示一直是自然语言处理社区提升知识图谱任务性能的重要研究方向。用于增强知识图谱嵌入的外部知识源涵盖范围广泛,从富含语义的词汇依赖解析特征、相关关键词集合,到来自外部语料库(如维基百科等)的完整文本描述。尽管文本增强型知识图谱嵌入取得了显著进展,但本文提出该方法仍可进一步优化。不同于使用单一文本描述(因文本固有的词汇歧义性难以充分表征实体),本文提出多任务框架:联合筛选与知识图谱实体相关的文本描述集合,同时实现知识图谱嵌入与文本描述的校准/增强。与以往依赖知识库中预定义实体描述的方法不同,本框架采用检索器模型主动识别更丰富或高度相关的文本描述用于实体增强。此外,该框架将增强过程中使用的描述数量视为可调参数,允许在确定合适数量前遍历多个候选值。链接预测实验结果表明,相较于使用传统CNN的文本增强型知识图谱方法,本方法在平均倒数排名(MRR)上提升5.5%,在Hits@10指标上提升3.5%。