Leveraging Large Language Models for Rare Disease Named Entity Recognition

Named Entity Recognition (NER) in the rare disease domain poses unique challenges due to limited labeled data, semantic ambiguity between entity types, and long-tail distributions. In this study, we evaluate the capabilities of GPT-4o for rare disease NER under low-resource settings, using a range of prompt-based strategies including zero-shot prompting, few-shot in-context learning, retrieval-augmented generation (RAG), and task-level fine-tuning. We design a structured prompting framework that encodes domain-specific knowledge and disambiguation rules for four entity types. We further introduce two semantically guided few-shot example selection methods to improve in-context performance while reducing labeling effort. Experiments on the RareDis Corpus show that GPT-4o achieves competitive or superior performance compared to BioClinicalBERT, with task-level fine-tuning yielding the strongest performance among the evaluated approaches and improving upon the previously reported BioClinicalBERT baseline. Cost-performance analysis reveals that few-shot prompting delivers high returns at low token budgets. RAG provides limited overall gains but can improve recall for challenging entity types, especially signs and symptoms. An error taxonomy highlights common failure modes such as boundary drift and type confusion, suggesting opportunities for post-processing and hybrid refinement. Our results demonstrate that prompt-optimized LLMs can serve as effective, scalable alternatives to traditional supervised models in biomedical NER, particularly in rare disease applications where annotated data is scarce.

翻译：罕见疾病领域的命名实体识别（NER）面临独特挑战，包括标注数据有限、实体类型间语义模糊以及长尾分布。本研究评估了GPT-4o在低资源环境下进行罕见疾病NER的能力，采用了一系列基于提示的策略，包括零样本提示、少样本上下文学习、检索增强生成（RAG）以及任务级微调。我们设计了一个结构化提示框架，该框架为四种实体类型编码了领域特定知识和消歧规则。进一步提出了两种语义引导的少样本示例选择方法，以在降低标注成本的同时提升上下文学习性能。在RareDis语料库上的实验表明，GPT-4o相较于BioClinicalBERT取得了具有竞争力或更优的性能，其中任务级微调在所有评估方法中表现最强，并超越了先前报道的BioClinicalBERT基线。成本-性能分析显示，少样本提示在较低标记预算下即可获得高回报。RAG带来的整体增益有限，但能提升困难实体类型（特别是体征与症状）的召回率。错误分类学分析揭示了边界漂移和类型混淆等常见失效模式，为后处理和混合优化提供了改进方向。我们的研究结果表明，经过提示优化的LLM可作为生物医学NER中传统监督模型的有效、可扩展替代方案，尤其在标注数据稀缺的罕见疾病应用场景中。

相关内容

实体

关注 12

实体（entity）是有可区别性且独立存在的某种事物，但它不需要是物质上的存在。尤其是抽象和法律拟制也通常被视为实体。实体可被看成是一包含有子集的集合。在哲学里，这种集合被称为客体。实体可被使用来指涉某个可能是人、动物、植物或真菌等不会思考的生命、无生命物体或信念等的事物。在这一方面，实体可以被视为一全包的词语。有时，实体被当做本质的广义，不论即指的是否为物质上的存在，如时常会指涉到的无物质形式的实体－语言。更有甚者，实体有时亦指存在或本质本身。在法律上，实体是指能具有权利和义务的事物。这通常是指法人，但也包括自然人。

大型语言模型疾病诊断综述

专知会员服务

32+阅读 · 2024年9月21日

用于疾病诊断的大型语言模型：范围综述

专知会员服务

26+阅读 · 2024年9月8日

Nat. Med. | 医学中的大型语言模型

专知会员服务

58+阅读 · 2023年9月19日

「中文电子病历命名实体识别」的研究与进展

专知会员服务

32+阅读 · 2022年11月5日