Large Language Models (LLMs) demonstrate remarkable versatility in various NLP tasks but encounter distinct challenges in biomedical due to the complexities of language and data scarcity. This paper investigates LLMs application in the biomedical domain by exploring strategies to enhance their performance for the NER task. Our study reveals the importance of meticulously designed prompts in the biomedical. Strategic selection of in-context examples yields a marked improvement, offering ~15-20\% increase in F1 score across all benchmark datasets for biomedical few-shot NER. Additionally, our results indicate that integrating external biomedical knowledge via prompting strategies can enhance the proficiency of general-purpose LLMs to meet the specialized needs of biomedical NER. Leveraging a medical knowledge base, our proposed method, DiRAG, inspired by Retrieval-Augmented Generation (RAG), can boost the zero-shot F1 score of LLMs for biomedical NER. Code is released at \url{https://github.com/masoud-monajati/LLM_Bio_NER}
翻译:大型语言模型(LLMs)在各种自然语言处理任务中展现出卓越的通用性,但在生物医学领域,由于语言复杂性和数据稀缺性,它们面临着独特的挑战。本文通过探索提升LLMs在命名实体识别任务中性能的策略,研究了其在生物医学领域的应用。我们的研究揭示了在生物医学语境下精心设计提示词的重要性。策略性地选择上下文示例带来了显著改进,在生物医学少样本命名实体识别的所有基准数据集上,F1分数提高了约15-20%。此外,我们的结果表明,通过提示策略整合外部生物医学知识,可以增强通用LLMs的能力,以满足生物医学命名实体识别的专业化需求。受检索增强生成(RAG)的启发,我们提出的方法DiRAG利用医学知识库,能够提升LLMs在生物医学命名实体识别任务上的零样本F1分数。代码发布于 \url{https://github.com/masoud-monajati/LLM_Bio_NER}