Clinical named entity recognition (NER) aims to retrieve important entities within clinical narratives. Recent works have demonstrated that large language models (LLMs) can achieve strong performance in this task. While previous works focus on proprietary LLMs, we investigate how open NER LLMs, trained specifically for entity recognition, perform in clinical NER. In this paper, we aim to improve them through a novel framework, entity decomposition with filtering, or EDF. Our key idea is to decompose the entity recognition task into several retrievals of sub-entity types. We also introduce a filtering mechanism to remove incorrect entities. Our experimental results demonstrate the efficacy of our framework across all metrics, models, datasets, and entity types. Our analysis reveals that entity decomposition can recognize previously missed entities with substantial improvement. We further provide a comprehensive evaluation of our framework and an in-depth error analysis to pave future works.
翻译:临床命名实体识别(NER)旨在从临床叙述文本中提取重要实体。近期研究表明,大语言模型(LLM)在该任务中能取得优异性能。以往研究多关注专有LLM,本文则探究专门针对实体识别训练的开源NER LLM在临床NER中的表现。本文提出一种新颖框架——基于过滤的实体分解(EDF)来提升其性能。核心思想是将实体识别任务分解为多个子实体类型的检索过程,并引入过滤机制以剔除错误实体。实验结果表明,该框架在所有评估指标、模型、数据集和实体类型上均表现出显著有效性。分析显示,实体分解方法能识别以往遗漏的实体,并带来实质性性能提升。本文进一步提供了框架的全面评估与深入错误分析,为后续研究奠定基础。