Entity linking aims to link ambiguous mentions to their corresponding entities in a knowledge base. One of the key challenges comes from insufficient labeled data for specific domains. Although dense retrievers have achieved excellent performance on several benchmarks, their performance decreases significantly when only a limited amount of in-domain labeled data is available. In such few-shot setting, we revisit the sparse retrieval method, and propose an ELECTRA-based keyword extractor to denoise the mention context and construct a better query expression. For training the extractor, we propose a distant supervision method to automatically generate training data based on overlapping tokens between mention contexts and entity descriptions. Experimental results on the ZESHEL dataset demonstrate that the proposed method outperforms state-of-the-art models by a significant margin across all test domains, showing the effectiveness of keyword-enhanced sparse retrieval.
翻译:实体链接旨在将歧义提及链接到知识库中对应的实体。其主要挑战之一在于特定领域标注数据的不足。尽管密集检索器在多个基准测试中取得了优异性能,但当仅有少量领域内标注数据可用时,其性能显著下降。在此少样本场景下,我们重新审视稀疏检索方法,并提出基于ELECTRA的关键词提取器以对提及上下文进行去噪并构建更优的查询表达式。为训练该提取器,我们提出一种远程监督方法,基于提及上下文与实体描述之间的重叠标记自动生成训练数据。在ZESHEL数据集上的实验结果表明,所提方法在所有测试领域均以显著优势超越现有最优模型,验证了关键词增强型稀疏检索的有效性。