Extracting clinically relevant information from unstructured medical narratives such as admission notes, discharge summaries, and emergency case histories remains a challenge in clinical natural language processing (NLP). Medical Entity Recognition (MER) identifies meaningful concepts embedded in these records. Recent advancements in large language models (LLMs) have shown competitive MER performance; however, evaluations often focus on general entity types, offering limited utility for real-world clinical needs requiring finer-grained extraction. To address this gap, we rigorously evaluated the open-source LLaMA3 model for fine-grained medical entity recognition across 18 clinically detailed categories. To optimize performance, we employed three learning paradigms: zero-shot, few-shot, and fine-tuning with Low-Rank Adaptation (LoRA). To further enhance few-shot learning, we introduced two example selection methods based on token- and sentence-level embedding similarity, utilizing a pre-trained BioBERT model. Unlike prior work assessing zero-shot and few-shot performance on proprietary models (e.g., GPT-4) or fine-tuning different architectures, we ensured methodological consistency by applying all strategies to a unified LLaMA3 backbone, enabling fair comparison across learning settings. Our results showed that fine-tuned LLaMA3 surpasses zero-shot and few-shot approaches by 63.11% and 35.63%, respectivel respectively, achieving an F1 score of 81.24% in granular medical entity extraction.
翻译:从非结构化医疗记录(如入院记录、出院小结和急诊病历)中提取临床相关信息仍是临床自然语言处理领域的挑战。医学实体识别旨在识别这些记录中包含的有意义概念。大型语言模型的最新进展展现了具有竞争力的医学实体识别性能,然而,现有评估常聚焦于通用实体类型,对需要更细粒度提取的实际临床需求帮助有限。为解决这一问题,我们严格评估了开源LLaMA3模型在18个临床详细类别上的细粒度医学实体识别能力。为优化性能,我们采用了三种学习范式:零样本学习、少样本学习以及基于低秩适应微调。为增强少样本学习,我们引入两种基于词级和句级嵌入相似度的示例选择方法,并利用预训练BioBERT模型实现。不同于以往研究在专有模型(如GPT-4)上评估零样本/少样本性能或对不同架构进行微调,我们通过将所有策略应用于统一的LLaMA3骨干网络确保方法一致性,实现不同学习设置间的公平比较。结果表明,微调后的LLaMA3在细粒度医学实体提取中较零样本和少样本方法分别提升63.11%和35.63%,F1分数达到81.24%。