Large Language Models (LLMs) are widely used in critical fields such as healthcare, education, and finance due to their remarkable proficiency in various language-related tasks. However, LLMs are prone to generating factually incorrect responses or "hallucinations," which can lead to a loss of credibility and trust among users. To address this issue, we propose a multi-stage framework that generates the rationale first, verifies and refines incorrect ones, and uses them as supporting references to generate the answer. The generated rationale enhances the transparency of the answer and our framework provides insights into how the model arrived at this answer, by using this rationale and the references to the context. In this paper, we demonstrate its effectiveness in improving the quality of responses to drug-related inquiries in the life sciences industry. Our framework improves traditional Retrieval Augmented Generation (RAG) by enabling OpenAI GPT-3.5-turbo to be 14-25% more faithful and 16-22% more accurate on two datasets. Furthermore, fine-tuning samples based on our framework improves the accuracy of smaller open-access LLMs by 33-42% and competes with RAG on commercial models.
翻译:大型语言模型(LLMs)因其在各类语言任务中的卓越能力,被广泛应用于医疗、教育、金融等关键领域。然而,LLMs容易生成事实错误的响应或“幻觉”,这可能导致用户对模型可信度的下降。为解决此问题,我们提出一个多阶段框架:首先生成推理依据,验证并修正错误的推理,随后将其作为支持性参考来生成答案。生成的推理依据增强了答案的透明度,同时该框架通过利用推理依据与上下文参考,揭示了模型得出答案的过程。本文证明了该框架在生命科学行业中提升药物相关查询响应质量的有效性。该框架通过使OpenAI GPT-3.5-turbo在两个数据集上的忠实度提升14-25%、准确率提升16-22%,增强了传统检索增强生成(RAG)技术。此外,基于该框架的微调样本使小型开源LLMs的准确率提升33-42%,并与商业模型的RAG效果相当。