Entity disambiguation (ED), which links the mentions of ambiguous entities to their referent entities in a knowledge base, serves as a core component in entity linking (EL). Existing generative approaches demonstrate improved accuracy compared to classification approaches under the standardized ZELDA benchmark. Nevertheless, generative approaches suffer from the need for large-scale pre-training and inefficient generation. Most importantly, entity descriptions, which could contain crucial information to distinguish similar entities from each other, are often overlooked. We propose an encoder-decoder model to disambiguate entities with more detailed entity descriptions. Given text and candidate entities, the encoder learns interactions between the text and each candidate entity, producing representations for each entity candidate. The decoder then fuses the representations of entity candidates together and selects the correct entity. Our experiments, conducted on various entity disambiguation benchmarks, demonstrate the strong and robust performance of this model, particularly +1.5% in the ZELDA benchmark compared with GENRE. Furthermore, we integrate this approach into the retrieval/reader framework and observe +1.5% improvements in end-to-end entity linking in the GERBIL benchmark compared with EntQA.
翻译:实体消歧(Entity Disambiguation, ED)作为实体链接(Entity Linking, EL)的核心组件,旨在将模糊实体的提及映射到知识库中的对应实体。在标准化ZELDA基准测试中,现有生成式方法相较于分类方法展现出更高的准确率。然而,生成式方法存在需要大规模预训练及生成效率低下的问题。更关键的是,实体描述作为区分相似实体的关键信息,往往被此类方法所忽略。为此,本文提出一种编码器-解码器模型,利用更详细的实体描述进行消歧。在给定文本与候选实体时,编码器学习文本与每个候选实体间的交互,生成各候选实体的表征。随后,解码器融合所有候选实体的表征,并选择正确实体。我们在多个实体消歧基准上的实验表明,该模型具有强大且稳健的性能,尤其在ZELDA基准中较GENRE方法提升1.5%。此外,我们将该方法集成到检索/阅读器框架中,在GERBIL基准的端到端实体链接任务上较EntQA方法取得1.5%的性能提升。