Retrieval Augmented Generation (RAG) represents a significant advancement in artificial intelligence combining a retrieval phase with a generative phase, with the latter typically being powered by large language models (LLMs). The current common practices in RAG involve using "instructed" LLMs, which are fine-tuned with supervised training to enhance their ability to follow instructions and are aligned with human preferences using state-of-the-art techniques. Contrary to popular belief, our study demonstrates that base models outperform their instructed counterparts in RAG tasks by 20% on average under our experimental settings. This finding challenges the prevailing assumptions about the superiority of instructed LLMs in RAG applications. Further investigations reveal a more nuanced situation, questioning fundamental aspects of RAG and suggesting the need for broader discussions on the topic; or, as Fromm would have it, "Seldom is a glance at the statistics enough to understand the meaning of the figures".
翻译:检索增强生成(RAG)代表了人工智能领域的重要进展,它结合了检索阶段与生成阶段,后者通常由大语言模型(LLM)驱动。当前RAG的普遍实践采用"指令微调"的LLM,这些模型通过监督训练进行微调以增强其遵循指令的能力,并采用前沿技术使其与人类偏好对齐。与普遍认知相反,我们的研究表明,在实验设置下,基础模型在RAG任务中的平均表现优于其指令微调版本达20%。这一发现对当前关于指令微调LLM在RAG应用中优越性的主流假设提出了挑战。进一步的研究揭示了更为复杂的图景,对RAG的基本原理提出了质疑,并表明需要就此议题展开更广泛的讨论;正如弗洛姆所言:"仅凭统计数据往往不足以理解数字背后的真实含义"。