Large language models (LLMs), especially when instruction-tuned for chat, have become part of our daily lives, freeing people from the process of searching, extracting, and integrating information from multiple sources by offering a straightforward answer to a variety of questions in a single place. Unfortunately, in many cases, LLM responses are factually incorrect, which limits their applicability in real-world scenarios. As a result, research on evaluating and improving the factuality of LLMs has attracted a lot of research attention recently. In this survey, we critically analyze existing work with the aim to identify the major challenges and their associated causes, pointing out to potential solutions for improving the factuality of LLMs, and analyzing the obstacles to automated factuality evaluation for open-ended text generation. We further offer an outlook on where future research should go.
翻译:大型语言模型(LLMs),尤其是经过指令微调以适配对话场景的模型,已成为日常生活的一部分。它们通过直接提供对多种问题的简洁答案,使人们免于从多个来源搜索、提取和整合信息。然而,在许多情况下,LLM的响应存在事实性错误,这限制了其在现实场景中的应用。因此,近年来,评估和提升LLM事实性的研究吸引了广泛关注。本综述旨在通过批判性分析现有工作,识别主要挑战及其相关成因,指出提升LLM事实性的潜在解决方案,并剖析对开放式文本生成进行自动化事实性评估的障碍。此外,我们进一步展望了未来研究方向。