Phenotype concept recognition (CR) is a fundamental task in biomedical text mining, enabling applications such as clinical diagnostics and knowledge graph construction. However, existing methods often require ontology-specific training and struggle to generalize across diverse text types and evolving biomedical terminology. We present AutoPCR, a prompt-based phenotype CR method that does not require ontology-specific training. AutoPCR performs CR in three stages: entity extraction using a hybrid of rule-based and neural tagging strategies, candidate retrieval via SapBERT, and entity linking through prompting a large language model. Experiments on four benchmark datasets show that AutoPCR achieves the best average and most robust performance across both mention-level and document-level evaluations, surpassing prior state-of-the-art methods. Further ablation and transfer studies demonstrate its inductive capability and generalizability to new ontologies.
翻译:表型概念识别是生物医学文本挖掘的一项基础任务,能够支持临床诊断和知识图谱构建等应用。然而,现有方法通常需要针对特定本体进行训练,且难以泛化至不同类型的文本和不断演进的生物医学术语。本文提出AutoPCR,一种基于提示的表型概念识别方法,无需针对特定本体进行训练。AutoPCR通过三个阶段执行概念识别:使用基于规则和神经标注的混合策略进行实体抽取,通过SapBERT进行候选检索,以及通过提示大语言模型完成实体链接。在四个基准数据集上的实验表明,AutoPCR在提及级别和文档级别的评估中均取得了最佳的平均性能和最强的鲁棒性,超越了现有最先进方法。进一步的消融实验和迁移研究验证了其对新本体的归纳能力和泛化性。