Large language models (LLMs) memorize a vast amount of factual knowledge, exhibiting strong performance across diverse tasks and domains. However, it has been observed that the performance diminishes when dealing with less-popular or low-frequency concepts and entities, for example in domain specific applications. The two prominent approaches to enhance the performance of LLMs on low-frequent topics are: Retrieval Augmented Generation (RAG) and fine-tuning (FT) over synthetic data. This paper explores and evaluates the impact of RAG and FT on customizing LLMs in handling low-frequency entities on question answering task. Our findings indicate that FT significantly boosts the performance across entities of varying popularity, especially in the most and least popular groups, while RAG surpasses other methods. Additionally, the success of both RAG and FT approaches is amplified by advancements in retrieval and data augmentation techniques. We release our data and code at https://github.com/informagi/RAGvsFT.
翻译:大型语言模型(LLMs)能够记忆大量事实知识,在多种任务和领域展现强大性能。然而研究表明,在处理少众或低频概念与实体(如特定领域应用)时,其性能会下降。提升LLMs在低频主题上表现的两大主流方法为:检索增强生成(RAG)和基于合成数据的微调(FT)。本文探索并评估了RAG和FT在定制LLMs处理低频实体问答任务中的效果。我们的发现表明:FT显著提升了不同流行度实体的性能,尤其在最高和最低流行度组中表现突出,而RAG方法则整体优于其他方法。此外,RAG和FT方法的成功均得益于检索技术和数据增强技术的进步。我们已将数据和代码开源在https://github.com/informagi/RAGvsFT。