Large Language Models (LLMs), although powerful in general domains, often perform poorly on domain-specific tasks like medical question answering (QA). Moreover, they tend to function as "black-boxes," making it challenging to modify their behavior. To address the problem, our study delves into retrieval augmented generation (RAG), aiming to improve LLM responses without the need for fine-tuning or retraining. Specifically, we propose a comprehensive retrieval strategy to extract medical facts from an external knowledge base, and then inject them into the query prompt for LLMs. Focusing on medical QA using the MedQA-SMILE dataset, we evaluate the impact of different retrieval models and the number of facts provided to the LLM. Notably, our retrieval-augmented Vicuna-7B model exhibited an accuracy improvement from 44.46% to 48.54%. This work underscores the potential of RAG to enhance LLM performance, offering a practical approach to mitigate the challenges of black-box LLMs.
翻译:大型语言模型(LLM)虽然在通用领域表现强大,但在医学问答等特定领域任务中往往表现不佳。此外,它们通常以“黑箱”方式运行,难以修改其行为。为解决该问题,本研究深入探讨检索增强生成技术,旨在无需微调或重新训练即可改进LLM的响应。具体而言,我们提出一种综合性检索策略,从外部知识库中提取医学事实,并将其注入LLM的查询提示中。基于MedQA-SMILE数据集的医学问答实验表明,我们评估了不同检索模型及向LLM提供事实数量对性能的影响。值得注意的是,采用检索增强的Vicuna-7B模型准确率从44.46%提升至48.54%。这项工作凸显了RAG提升LLM性能的潜力,为缓解黑箱LLM的挑战提供了实用方法。