Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models

Recent proprietary large language models (LLMs), such as GPT-4, have achieved a milestone in tackling diverse challenges in the biomedical domain, ranging from multiple-choice questions to long-form generations. To address challenges that still cannot be handled with the encoded knowledge of LLMs, various retrieval-augmented generation (RAG) methods have been developed by searching documents from the knowledge corpus and appending them unconditionally or selectively to the input of LLMs for generation. However, when applying existing methods to different domain-specific problems, poor generalization becomes apparent, leading to fetching incorrect documents or making inaccurate judgments. In this paper, we introduce Self-BioRAG, a framework reliable for biomedical text that specializes in generating explanations, retrieving domain-specific documents, and self-reflecting generated responses. We utilize 84k filtered biomedical instruction sets to train Self-BioRAG that can assess its generated explanations with customized reflective tokens. Our work proves that domain-specific components, such as a retriever, domain-related document corpus, and instruction sets are necessary for adhering to domain-related instructions. Using three major medical question-answering benchmark datasets, experimental results of Self-BioRAG demonstrate significant performance gains by achieving a 7.2% absolute improvement on average over the state-of-the-art open-foundation model with a parameter size of 7B or less. Overall, we analyze that Self-BioRAG finds the clues in the question, retrieves relevant documents if needed, and understands how to answer with information from retrieved documents and encoded knowledge as a medical expert does. We release our data and code for training our framework components and model weights (7B and 13B) to enhance capabilities in biomedical and clinical domains.

翻译：近期，以GPT-4为代表的专有大语言模型（LLMs）已在生物医学领域取得突破性进展，能够应对从多项选择题到长文本生成在内的多种挑战。针对LLMs内隐知识仍无法处理的难题，研究者开发了多种检索增强生成（RAG）方法，通过从知识库中检索文档并无条件或有选择地附加至LLMs输入端以辅助生成。然而，现有方法在应用于不同领域特定问题时，泛化能力不足的问题日益凸显，常导致检索错误文档或产生不准确判断。本文提出Self-BioRAG框架——一种专为生物医学文本设计的可靠系统，具备生成解释、检索领域文档及对生成内容进行自我反思的能力。我们利用8.4万条经过筛选的生物医学指令集训练Self-BioRAG，使其能够通过定制化的反思标记评估自身生成的解释。研究表明，领域专用组件（如检索器、领域相关文档库及指令集）对于遵循领域特定指令至关重要。在三大医学问答基准数据集上的实验表明，Self-BioRAG相较参数规模不超过70亿的最新开源基础模型实现了平均7.2%的绝对性能提升。综合分析表明，Self-BioRAG能够像医学专家那样：解析问题线索、按需检索相关文档，并综合检索信息与内隐知识生成解答。我们公开了框架组件训练数据、代码及模型权重（70亿与130亿参数），以促进生物医学与临床领域的能力发展。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日