Few-shot and zero-shot entity linking focus on the tail and emerging entities, which are more challenging but closer to real-world scenarios. The mainstream method is the ''retrieve and rerank'' two-stage framework. In this paper, we propose a coarse-to-fine lexicon-based retriever to retrieve entity candidates in an effective manner, which operates in two layers. The first layer retrieves coarse-grained candidates by leveraging entity names, while the second layer narrows down the search to fine-grained candidates within the coarse-grained ones. In addition, this second layer utilizes entity descriptions to effectively disambiguate tail or new entities that share names with existing popular entities. Experimental results indicate that our approach can obtain superior performance without requiring extensive finetuning in the retrieval stage. Notably, our approach ranks the 1st in NLPCC 2023 Shared Task 6 on Chinese Few-shot and Zero-shot Entity Linking.
翻译:少样本和零样本实体链接聚焦于长尾实体和新涌现实体,这些任务更具挑战性但更贴近真实场景。主流方法采用“检索-重排序”两阶段框架。本文提出一种基于词典的粗到细检索器,通过两层操作高效获取候选实体:第一层利用实体名称检索粗粒度候选集,第二层在粗粒度候选集中缩小范围至细粒度候选实体。此外,第二层通过实体描述有效消解与既有流行实体共享名称的长尾或新实体。实验结果表明,本方法在无需对检索阶段进行大量微调的情况下即可获得优越性能。值得注意的是,本方法在NLPCC 2023评测任务六“中文少样本和零样本实体链接”中位列第一。