Recent proprietary large language models (LLMs), such as GPT-4, have achieved a milestone in tackling diverse challenges in the biomedical domain, ranging from multiple-choice questions to long-form generations. To address challenges that still cannot be handled with the encoded knowledge of LLMs, various retrieval-augmented generation (RAG) methods have been developed by searching documents from the knowledge corpus and appending them unconditionally or selectively to the input of LLMs for generation. However, when applying existing methods to different domain-specific problems, poor generalization becomes apparent, leading to fetching incorrect documents or making inaccurate judgments. In this paper, we introduce Self-BioRAG, a framework reliable for biomedical text that specializes in generating explanations, retrieving domain-specific documents, and self-reflecting generated responses. We utilize 84k filtered biomedical instruction sets to train Self-BioRAG that can assess its generated explanations with customized reflective tokens. Our work proves that domain-specific components, such as a retriever, domain-related document corpus, and instruction sets are necessary for adhering to domain-related instructions. Using three major medical question-answering benchmark datasets, experimental results of Self-BioRAG demonstrate significant performance gains by achieving a 7.2% absolute improvement on average over the state-of-the-art open-foundation model with a parameter size of 7B or less. Overall, we analyze that Self-BioRAG finds the clues in the question, retrieves relevant documents if needed, and understands how to answer with information from retrieved documents and encoded knowledge as a medical expert does. We release our data and code for training our framework components and model weights (7B and 13B) to enhance capabilities in biomedical and clinical domains.
翻译:近期,以GPT-4为代表的专有大语言模型(LLMs)已在生物医学领域取得突破性进展,能够应对从多项选择题到长文本生成在内的多种挑战。针对LLMs内隐知识仍无法处理的难题,研究者开发了多种检索增强生成(RAG)方法,通过从知识库中检索文档并无条件或有选择地附加至LLMs输入端以辅助生成。然而,现有方法在应用于不同领域特定问题时,泛化能力不足的问题日益凸显,常导致检索错误文档或产生不准确判断。本文提出Self-BioRAG框架——一种专为生物医学文本设计的可靠系统,具备生成解释、检索领域文档及对生成内容进行自我反思的能力。我们利用8.4万条经过筛选的生物医学指令集训练Self-BioRAG,使其能够通过定制化的反思标记评估自身生成的解释。研究表明,领域专用组件(如检索器、领域相关文档库及指令集)对于遵循领域特定指令至关重要。在三大医学问答基准数据集上的实验表明,Self-BioRAG相较参数规模不超过70亿的最新开源基础模型实现了平均7.2%的绝对性能提升。综合分析表明,Self-BioRAG能够像医学专家那样:解析问题线索、按需检索相关文档,并综合检索信息与内隐知识生成解答。我们公开了框架组件训练数据、代码及模型权重(70亿与130亿参数),以促进生物医学与临床领域的能力发展。