Large language models (LLMs) encapsulate a vast amount of factual information within their pre-trained weights, as evidenced by their ability to answer diverse questions across different domains. However, this knowledge is inherently limited, relying heavily on the characteristics of the training data. Consequently, using external datasets to incorporate new information or refine the capabilities of LLMs on previously seen information poses a significant challenge. In this study, we compare two common approaches: unsupervised fine-tuning and retrieval-augmented generation (RAG). We evaluate both approaches on a variety of knowledge-intensive tasks across different topics. Our findings reveal that while unsupervised fine-tuning offers some improvement, RAG consistently outperforms it, both for existing knowledge encountered during training and entirely new knowledge. Moreover, we find that LLMs struggle to learn new factual information through unsupervised fine-tuning, and that exposing them to numerous variations of the same fact during training could alleviate this problem.
翻译:大型语言模型(LLM)在其预训练权重中封装了大量事实性信息,这体现在它们能够回答跨领域的不同问题。然而,这种知识本质上是有限的,严重依赖于训练数据的特性。因此,利用外部数据集来融入新信息或优化LLM对已有信息的处理能力构成了重大挑战。在本研究中,我们比较了两种常见方法:无监督微调和检索增强生成(RAG)。我们针对不同主题的多种知识密集型任务评估了这两种方法。研究结果表明,尽管无监督微调能带来一定改进,但RAG在训练中遇到的已有知识和全新知识上均持续优于微调。此外,我们发现LLM难以通过无监督微调学习新的事实信息,而在训练中向模型展示同一事实的多种变体可以缓解这一问题。