Large language models (LLMs) encapsulate a vast amount of factual information within their pre-trained weights, as evidenced by their ability to answer diverse questions across different domains. However, this knowledge is inherently limited, relying heavily on the characteristics of the training data. Consequently, using external datasets to incorporate new information or refine the capabilities of LLMs on previously seen information poses a significant challenge. In this study, we compare two common approaches: unsupervised fine-tuning and retrieval-augmented generation (RAG). We evaluate both approaches on a variety of knowledge-intensive tasks across different topics. Our findings reveal that while unsupervised fine-tuning offers some improvement, RAG consistently outperforms it, both for existing knowledge encountered during training and entirely new knowledge. Moreover, we find that LLMs struggle to learn new factual information through unsupervised fine-tuning, and that exposing them to numerous variations of the same fact during training could alleviate this problem.
翻译:大语言模型(LLMs)在其预训练权重中封装了大量事实性信息,这一点可从其跨领域回答各类问题的能力得到验证。然而,这类知识本质上具有局限性,严重依赖于训练数据的特征。因此,利用外部数据集注入新信息或优化LLMs对已有信息的处理能力,成为一项重大挑战。本研究比较了两种常见方法:无监督微调与检索增强生成(RAG)。我们针对不同主题的多种知识密集型任务评估了这两种方法。研究结果表明,尽管无监督微调能带来一定改进,但RAG始终表现更优——无论是针对训练中已接触的现有知识,还是全新知识。此外,我们发现LLMs难以通过无监督微调学习新的事实信息,但在训练过程中让模型接触同一事实的多重变体,可缓解这一问题。