Recently, Large Language Models (LLMs) have been increasingly used to support various decision-making tasks, assisting humans in making informed decisions. However, when LLMs confidently provide incorrect information, it can lead humans to make suboptimal decisions. To prevent LLMs from generating incorrect information on topics they are unsure of and to improve the accuracy of generated content, prior works have proposed Retrieval Augmented Generation (RAG), where external documents are referenced to generate responses. However, previous RAG methods focus only on retrieving documents most relevant to the input query, without specifically aiming to ensure that the human user's decisions are well-calibrated. To address this limitation, we propose a novel retrieval method called Calibrated Retrieval-Augmented Generation (CalibRAG), which ensures that decisions informed by RAG are well-calibrated. Then we empirically validate that CalibRAG improves calibration performance as well as accuracy, compared to other baselines across various datasets.
翻译:近年来,大型语言模型(LLM)被越来越多地用于支持各类决策任务,以协助人类做出明智决策。然而,当LLM自信地提供错误信息时,可能导致人类做出次优决策。为防止LLM在不确定的主题上生成错误信息并提升生成内容的准确性,先前的研究提出了检索增强生成(RAG)方法,即通过引用外部文档来生成回答。然而,以往的RAG方法仅侧重于检索与输入查询最相关的文档,并未专门致力于确保人类用户的决策得到良好校准。为应对这一局限,我们提出了一种名为校准检索增强生成(CalibRAG)的新型检索方法,该方法能确保基于RAG的决策得到良好校准。随后,我们通过实证验证了CalibRAG在多个数据集上相较于其他基线方法,不仅提升了校准性能,同时也提高了准确性。