Although biomedical entity linking (BioEL) has made significant progress with pre-trained language models, challenges still exist for fine-grained and long-tailed entities. To address these challenges, we present BioELQA, a novel model that treats Biomedical Entity Linking as Multiple Choice Question Answering. BioELQA first obtains candidate entities with a fast retriever, jointly presents the mention and candidate entities to a generator, and then outputs the predicted symbol associated with its chosen entity. This formulation enables explicit comparison of different candidate entities, thus capturing fine-grained interactions between mentions and entities, as well as among entities themselves. To improve generalization for long-tailed entities, we retrieve similar labeled training instances as clues and concatenate the input with retrieved instances for the generator. Extensive experimental results show that BioELQA outperforms state-of-the-art baselines on several datasets.
翻译:生物医学实体链接(BioEL)虽已借助预训练语言模型取得显著进展,但对细粒度实体和长尾实体的处理仍面临诸多挑战。针对这些问题,我们提出BioELQA——一种将生物医学实体链接建模为多项选择问答任务的新型模型。该模型首先通过快速检索器获取候选实体,随后将提及词与候选实体联合输入生成器,最终输出与所选实体对应的预测符号。这种设计使得不同候选实体之间能够进行显式比较,从而捕捉提及词与实体间以及实体之间的细粒度交互。为提升对长尾实体的泛化能力,我们检索相似的有标签训练实例作为线索,并将检索到的实例与原始输入拼接后输入生成器。大量实验结果表明,BioELQA在多个数据集上均优于现有最优基线模型。