Large language models (LLMs) encapsulate a vast amount of factual information within their pre-trained weights, as evidenced by their ability to answer diverse questions across different domains. However, this knowledge is inherently limited, relying heavily on the characteristics of the training data. Consequently, using external datasets to incorporate new information or refine the capabilities of LLMs on previously seen information poses a significant challenge. In this study, we compare two common approaches: fine-tuning and retrieval-augmented generation (RAG). We evaluate both approaches on a variety of knowledge-intensive tasks across different topics. Our findings reveal that while fine-tuning offers some improvement, RAG consistently outperforms it, both for existing knowledge encountered during training and entirely new knowledge. Moreover, we find that LLMs struggle to learn new factual information through fine-tuning, and that exposing them to numerous variations of the same fact during training could alleviate this problem.
翻译:大语言模型(LLM)在其预训练权重中封装了海量事实信息,这体现在它们能够回答不同领域的各类问题。然而,这种知识本质上是有限的,严重依赖于训练数据的特征。因此,利用外部数据集融入新信息或优化LLM对已见信息的处理能力构成重大挑战。在本研究中,我们比较了两种常见方法:微调和检索增强生成(RAG)。我们在不同主题的多种知识密集型任务上评估了这两种方法。我们的研究结果显示,尽管微调能带来一定改进,但RAG始终优于微调,无论是在训练期间已存在的知识还是全新的知识上均如此。此外,我们发现LLM难以通过微调学习新的事实信息,而让模型在训练中接触同一事实的多种变体可缓解这一问题。