Despite the impressive capabilities of large language models (LLMs) across diverse applications, they still suffer from trustworthiness issues, such as hallucinations and misalignments. Retrieval-augmented language models (RAG) have been proposed to enhance the credibility of generations by grounding external knowledge, but the theoretical understandings of their generation risks remains unexplored. In this paper, we answer: 1) whether RAG can indeed lead to low generation risks, 2) how to provide provable guarantees on the generation risks of RAG and vanilla LLMs, and 3) what sufficient conditions enable RAG models to reduce generation risks. We propose C-RAG, the first framework to certify generation risks for RAG models. Specifically, we provide conformal risk analysis for RAG models and certify an upper confidence bound of generation risks, which we refer to as conformal generation risk. We also provide theoretical guarantees on conformal generation risks for general bounded risk functions under test distribution shifts. We prove that RAG achieves a lower conformal generation risk than that of a single LLM when the quality of the retrieval model and transformer is non-trivial. Our intensive empirical results demonstrate the soundness and tightness of our conformal generation risk guarantees across four widely-used NLP datasets on four state-of-the-art retrieval models.
翻译:尽管大语言模型(LLM)在各类应用中展现出卓越能力,但其仍存在可信性问题,例如幻觉与对齐偏差。检索增强语言模型(RAG)通过基于外部知识来增强生成内容的可信度,然而其生成风险的理论认知尚属空白。本文旨在回答:1)RAG是否确实能降低生成风险;2)如何为RAG及原始LLM的生成风险提供可验证的保证;3)哪些充分条件能使RAG模型有效降低生成风险。我们提出C-RAG——首个为RAG模型提供生成风险认证的框架。具体而言,我们针对RAG模型开展一致性风险分析,证明了生成风险的上置信界(称为一致性生成风险)。同时,在测试分布偏移下,我们为一般有界风险函数提供了一致性生成风险的理论保证。我们证明了当检索模型与Transformer的质量非平凡时,RAG比单一LLM具有更低的一致性生成风险。大量实证结果表明,在四个主流NLP数据集上,我们的方法针对四种最先进的检索模型均能提供合理且紧凑的一致性生成风险保证。