Few-shot and zero-shot entity linking focus on the tail and emerging entities, which are more challenging but closer to real-world scenarios. The mainstream method is the ''retrieve and rerank'' two-stage framework. In this paper, we propose a coarse-to-fine lexicon-based retriever to retrieve entity candidates in an effective manner, which operates in two layers. The first layer retrieves coarse-grained candidates by leveraging entity names, while the second layer narrows down the search to fine-grained candidates within the coarse-grained ones. In addition, this second layer utilizes entity descriptions to effectively disambiguate tail or new entities that share names with existing popular entities. Experimental results indicate that our approach can obtain superior performance without requiring extensive finetuning in the retrieval stage. Notably, our approach ranks the 1st in NLPCC 2023 Shared Task 6 on Chinese Few-shot and Zero-shot Entity Linking.
翻译:少样本和零样本实体链接聚焦于长尾和新出现的实体,这些任务更具挑战性但更贴近实际场景。主流方法是“检索与重排序”两阶段框架。本文提出了一种由粗到精的基于词典的检索器,以有效方式检索实体候选,该检索器分为两层:第一层利用实体名称检索粗粒度候选,第二层在粗粒度候选范围内缩小搜索范围以找到细粒度候选。此外,这第二层利用实体描述来有效区分与现有流行实体共享名称的长尾或新实体。实验结果表明,我们的方法无需在检索阶段进行大量微调即可获得优异性能。值得注意,我们的方法在NLPCC 2023共享任务6(中文少样本和零样本实体链接)中排名第一。