Biomedical entity linking (EL) consists of named entity recognition (NER) and named entity disambiguation (NED). EL models are trained on corpora labeled by a predefined KB. However, it is a common scenario that only entities within a subset of the KB are precious to stakeholders. We name this scenario partial knowledge base inference: training an EL model with one KB and inferring on the part of it without further training. In this work, we give a detailed definition and evaluation procedures for this practically valuable but significantly understudied scenario and evaluate methods from three representative EL paradigms. We construct partial KB inference benchmarks and witness a catastrophic degradation in EL performance due to dramatically precision drop. Our findings reveal these EL paradigms can not correctly handle unlinkable mentions (NIL), so they are not robust to partial KB inference. We also propose two simple-and-effective redemption methods to combat the NIL issue with little computational overhead.
翻译:生物医学实体链接(EL)包括命名实体识别(NER)和命名实体消歧(NED)。EL模型在由预定义知识库标注的语料库上进行训练。然而,常见的情况是,只有知识库子集中的实体才对利益相关者有价值。我们将此场景命名为部分知识库推理:使用一个知识库训练EL模型,并在无需进一步训练的情况下对其部分内容进行推理。在本工作中,我们为这一实际重要但显著研究不足的场景给出了详细定义和评估流程,并评估了来自三种代表性EL范式的方法。我们构建了部分知识库推理基准,并观察到由于精度急剧下降导致的EL性能灾难性退化。我们的发现表明,这些EL范式无法正确处理不可链接提及(NIL),因此它们对部分知识库推理不具备鲁棒性。我们还提出了两种简单有效的补救方法,以极小的计算开销应对NIL问题。