Recent proprietary large language models (LLMs), such as GPT-4, have achieved a milestone in tackling diverse challenges in the biomedical domain, ranging from multiple-choice questions to long-form generations. To address challenges that still cannot be handled with the encoded knowledge of LLMs, various retrieval-augmented generation (RAG) methods have been developed by searching documents from the knowledge corpus and appending them unconditionally or selectively to the input of LLMs for generation. However, when applying existing methods to different domain-specific problems, poor generalization becomes apparent, leading to fetching incorrect documents or making inaccurate judgments. In this paper, we introduce Self-BioRAG, a framework reliable for biomedical text that specializes in generating explanations, retrieving domain-specific documents, and self-reflecting generated responses. We utilize 84k filtered biomedical instruction sets to train Self-BioRAG that can assess its generated explanations with customized reflective tokens. Our work proves that domain-specific components, such as a retriever, domain-related document corpus, and instruction sets are necessary for adhering to domain-related instructions. Using three major medical question-answering benchmark datasets, experimental results of Self-BioRAG demonstrate significant performance gains by achieving a 7.2% absolute improvement on average over the state-of-the-art open-foundation model with a parameter size of 7B or less. Overall, we analyze that Self-BioRAG finds the clues in the question, retrieves relevant documents if needed, and understands how to answer with information from retrieved documents and encoded knowledge as a medical expert does. We release our data and code for training our framework components and model weights (7B and 13B) to enhance capabilities in biomedical and clinical domains.
翻译:近期专有的大语言模型(如GPT-4)在生物医学领域取得了里程碑式进展,能够应对从多项选择题到长文本生成等多种挑战。为了解决大语言模型编码知识仍无法处理的难题,研究人员开发了多种检索增强生成(RAG)方法,通过从知识语料库中搜索文档,并将其无条件或有选择地附加至模型输入进行生成。然而,现有方法应用于特定领域问题时泛化能力不足,常导致检索到错误文档或做出不准确判断。本文提出Self-BioRAG——一个专为生物医学文本设计的可靠框架,能够生成解释、检索领域文档并对生成结果进行自我反思。我们利用8.4万条筛选后的生物医学指令集训练Self-BioRAG,使其能够通过自定义反思标记评估自身生成的解释。研究表明,检索器、领域相关文档语料库和指令集等领域专用组件对于遵循领域指令至关重要。在三个主要医学问答基准数据集上的实验结果显示,Self-BioRAG相较于参数规模不超过7B的最先进开源模型,平均绝对性能提升达7.2%。总体而言,我们分析发现Self-BioRAG能够像医学专家一样:定位问题中的线索,按需检索相关文档,并融合文档信息与编码知识进行作答。我们已公开发布框架组件训练所需的数据、代码以及模型权重(7B和13B),以提升生物医学与临床领域的能力。