Although biomedical entity linking (BioEL) has made significant progress with pre-trained language models, challenges still exist for fine-grained and long-tailed entities. To address these challenges, we present BioELQA, a novel model that treats Biomedical Entity Linking as Multiple Choice Question Answering. BioELQA first obtains candidate entities with a fast retriever, jointly presents the mention and candidate entities to a generator, and then outputs the predicted symbol associated with its chosen entity. This formulation enables explicit comparison of different candidate entities, thus capturing fine-grained interactions between mentions and entities, as well as among entities themselves. To improve generalization for long-tailed entities, we retrieve similar labeled training instances as clues and concatenate the input with retrieved instances for the generator. Extensive experimental results show that BioELQA outperforms state-of-the-art baselines on several datasets.
翻译:摘要:尽管基于预训练语言模型的生物医学实体链接(BioEL)已取得显著进展,但面向细粒度实体和长尾实体仍存在挑战。为解决这些问题,我们提出BioELQA这一新型模型,将生物医学实体链接转化为多项选择问答任务。该模型首先通过快速检索器获取候选实体,随后将提及词与候选实体共同输入生成器,最终输出与所选实体对应的预测符号。这种设计实现了不同候选实体间的显式对比,从而捕获提及词与实体之间、以及实体相互间的细粒度交互。为提升对长尾实体的泛化能力,我们检索相似标注训练样本作为线索,并将检索结果与原始输入拼接后输入生成器。大量实验结果表明,BioELQA在多个数据集上均优于现有最优基线模型。