Entity disambiguation (ED), which links the mentions of ambiguous entities to their referent entities in a knowledge base, serves as a core component in entity linking (EL). Existing generative approaches demonstrate improved accuracy compared to classification approaches under the standardized ZELDA benchmark. Nevertheless, generative approaches suffer from the need for large-scale pre-training and inefficient generation. Most importantly, entity descriptions, which could contain crucial information to distinguish similar entities from each other, are often overlooked. We propose an encoder-decoder model to disambiguate entities with more detailed entity descriptions. Given text and candidate entities, the encoder learns interactions between the text and each candidate entity, producing representations for each entity candidate. The decoder then fuses the representations of entity candidates together and selects the correct entity. Our experiments, conducted on various entity disambiguation benchmarks, demonstrate the strong and robust performance of this model, particularly +1.5% in the ZELDA benchmark compared with GENRE. Furthermore, we integrate this approach into the retrieval/reader framework and observe +1.5% improvements in end-to-end entity linking in the GERBIL benchmark compared with EntQA.
翻译:实体消歧(ED)作为实体链接(EL)的核心组件,旨在将歧义实体的提及与知识库中的对应实体关联起来。在标准化ZELDA基准测试中,现有生成式方法相比分类方法展现出更高的准确性。然而,生成式方法存在需要大规模预训练和生成效率低下的问题。更重要的是,包含区分相似实体关键信息的实体描述往往被忽略。我们提出了一种编码器-解码器模型,利用更详细的实体描述进行消歧。给定文本和候选实体,编码器学习文本与每个候选实体之间的交互,生成每个候选实体的表示,随后解码器融合所有候选实体的表示并选择正确实体。我们在多个实体消歧基准上的实验表明,该模型具有强大且稳健的性能,特别是在ZELDA基准测试中相较于GENRE方法提升了1.5%。此外,我们将该方法集成到检索/阅读器框架中,在GERBIL基准的端到端实体链接测试中观察到比EntQA方法提升1.5%的效果。