Biomedical entity linking (EL) consists of named entity recognition (NER) and named entity disambiguation (NED). EL models are trained on corpora labeled by a predefined KB. However, it is a common scenario that only entities within a subset of the KB are precious to stakeholders. We name this scenario partial knowledge base inference: training an EL model with one KB and inferring on the part of it without further training. In this work, we give a detailed definition and evaluation procedures for this practically valuable but significantly understudied scenario and evaluate methods from three representative EL paradigms. We construct partial KB inference benchmarks and witness a catastrophic degradation in EL performance due to dramatically precision drop. Our findings reveal these EL paradigms can not correctly handle unlinkable mentions (NIL), so they are not robust to partial KB inference. We also propose two simple-and-effective redemption methods to combat the NIL issue with little computational overhead.
翻译:生物医学实体链接(EL)包括命名实体识别(NER)和命名实体消歧(NED)。EL模型根据预定义知识库标注的语料进行训练。然而,常见场景是只有知识库子集内的实体对利益相关者有价值。我们将此场景命名为部分知识库推断:使用一个知识库训练EL模型,并在无需进一步训练的情况下对其部分子集进行推断。在本工作中,我们针对这一实际重要但研究显著不足的场景给出了详细定义和评估方法,并评估了三种代表性EL范式下的方法。我们构建了部分知识库推断基准数据集,并观察到因精确度急剧下降导致的EL性能灾难性退化。研究发现表明这些EL范式无法正确处理不可链接提及(NIL),因此对部分知识库推断不具备鲁棒性。我们还提出了两种简单有效的补救方法,以几乎零计算开销应对NIL问题。