Large language models (LLMs), especially when instruction-tuned for chat, have become part of our daily lives, freeing people from the process of searching, extracting, and integrating information from multiple sources by offering a straightforward answer to a variety of questions in a single place. Unfortunately, in many cases, LLM responses are factually incorrect, which limits their applicability in real-world scenarios. As a result, research on evaluating and improving the factuality of LLMs has attracted a lot of attention recently. In this survey, we critically analyze existing work with the aim to identify the major challenges and their associated causes, pointing out to potential solutions for improving the factuality of LLMs, and analyzing the obstacles to automated factuality evaluation for open-ended text generation. We further offer an outlook on where future research should go.
翻译:大型语言模型(LLMs),尤其是经过指令微调用于对话的模型,已成为我们日常生活的一部分。它们通过在单一位置为各类问题提供直接答案,将人们从多源信息的搜索、提取与整合过程中解放出来。然而,在许多情况下,LLM的回应存在事实性错误,这限制了其在现实场景中的适用性。因此,评估与提升LLM事实性的研究近来备受关注。本综述旨在批判性地分析现有工作,以识别主要挑战及其成因,指出提升LLM事实性的潜在解决方案,并分析开放式文本生成中自动化事实性评估所面临的障碍。我们进一步展望了未来研究的发展方向。