Large Language Models (LLMs) exhibit substantial capabilities yet encounter challenges, including hallucination, outdated knowledge, and untraceable reasoning processes. Retrieval-augmented generation (RAG) has emerged as a promising solution, integrating knowledge from external databases to mitigate these challenges. However, inappropriate retrieved passages can potentially hinder the LLMs' capacity to generate comprehensive and high-quality responses. Prior RAG studies on the robustness of retrieval noises often confine themselves to a limited set of noise types, deviating from real-world retrieval environments and limiting practical applicability. In this study, we initially investigate retrieval noises and categorize them into three distinct types, reflecting real-world environments. We analyze the impact of these various retrieval noises on the robustness of LLMs. Subsequently, we propose a novel RAG approach known as Retrieval-augmented Adaptive Adversarial Training (RAAT). RAAT leverages adaptive adversarial training to dynamically adjust the model's training process in response to retrieval noises. Concurrently, it employs multi-task learning to ensure the model's capacity to internally recognize noisy contexts. Extensive experiments demonstrate that the LLaMA-2 7B model trained using RAAT exhibits significant improvements in F1 and EM scores under diverse noise conditions. For reproducibility, we release our code and data at: https://github.com/calubkk/RAAT.
翻译:大型语言模型(LLMs)展现出强大能力,但仍面临幻觉、知识过时及推理过程不可追溯等挑战。检索增强生成(RAG)作为一种有前景的解决方案,通过整合外部数据库知识来缓解这些问题。然而,不恰当的检索段落可能阻碍LLMs生成全面且高质量回答的能力。现有关于检索噪声鲁棒性的RAG研究通常局限于有限噪声类型,与实际检索环境存在偏差,限制了其实用性。本研究首先系统探究检索噪声,并将其归纳为三种反映真实环境的不同类型,进而分析各类检索噪声对LLMs鲁棒性的影响。随后,我们提出一种新型RAG方法——检索增强自适应对抗训练(RAAT)。该方法利用自适应对抗训练动态调整模型训练过程以应对检索噪声,同时采用多任务学习确保模型内部识别噪声上下文的能力。大量实验表明,经RAAT训练的LLaMA-2 7B模型在多种噪声条件下的F1值与EM分数均有显著提升。为促进可复现性,我们已公开代码与数据:https://github.com/calubkk/RAAT。