Large Language Models (LLMs), although powerful in general domains, often perform poorly on domain-specific tasks such as medical question answering (QA). In addition, LLMs tend to function as "black-boxes", making it challenging to modify their behavior. To address the problem, our work employs a transparent process of retrieval augmented generation (RAG), aiming to improve LLM responses without the need for fine-tuning or retraining. Specifically, we propose a comprehensive retrieval strategy to extract medical facts from an external knowledge base, and then inject them into the LLM's query prompt. Focusing on medical QA, we evaluate the impact of different retrieval models and the number of facts on LLM performance using the MedQA-SMILE dataset. Notably, our retrieval-augmented Vicuna-7B model exhibited an accuracy improvement from 44.46% to 48.54%. This work underscores the potential of RAG to enhance LLM performance, offering a practical approach to mitigate the challenges posed by black-box LLMs.
翻译:大型语言模型(LLM)虽然在通用领域表现出色,但在医学问答等特定领域任务中往往表现不佳。此外,LLM 通常以“黑盒”方式运行,难以修改其行为。为解决这一问题,本研究采用透明的检索增强生成流程,旨在无需微调或重新训练即可提升 LLM 的响应质量。具体而言,我们提出一种综合检索策略,从外部知识库中提取医学事实,并将其注入到 LLM 的查询提示中。聚焦于医学问答任务,我们基于 MedQA-SMILE 数据集评估了不同检索模型及事实数量对 LLM 性能的影响。值得注意的是,采用检索增强的 Vicuna-7B 模型准确率从 44.46% 提升至 48.54%。本研究凸显了 RAG 在提升 LLM 性能方面的潜力,为缓解黑盒 LLM 带来的挑战提供了一种实用方法。