Large Language Models (LLMs), although powerful in general domains, often perform poorly on domain-specific tasks like medical question answering (QA). Moreover, they tend to function as "black-boxes," making it challenging to modify their behavior. Addressing this, our study delves into model editing utilizing in-context learning, aiming to improve LLM responses without the need for fine-tuning or retraining. Specifically, we propose a comprehensive retrieval strategy to extract medical facts from an external knowledge base, and then we incorporate them into the query prompt for the LLM. Focusing on medical QA using the MedQA-SMILE dataset, we evaluate the impact of different retrieval models and the number of facts provided to the LLM. Notably, our edited Vicuna model exhibited an accuracy improvement from 44.46% to 48.54%. This work underscores the potential of model editing to enhance LLM performance, offering a practical approach to mitigate the challenges of black-box LLMs.
翻译:大型语言模型(LLMs)虽然在通用领域表现强大,但在医学问答等特定领域任务中往往性能不佳。此外,它们常以“黑箱”方式运行,导致修改其行为面临挑战。针对这一问题,本研究探索了利用上下文学习的模型编辑方法,旨在无需微调或重新训练即可改进LLM的响应。具体而言,我们提出了一种综合检索策略,从外部知识库中提取医学事实,并将其整合到LLM的查询提示中。我们以MedQA-SMILE数据集上的医学问答任务为重点,评估了不同检索模型及提供给LLM的事实数量对效果的影响。值得注意的是,经编辑的Vicuna模型准确率从44.46%提升至48.54%。本研究凸显了模型编辑在增强LLM性能方面的潜力,为缓解黑箱LLM带来的挑战提供了一种实用方案。