A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models

As Large Language Models (LLMs) continue to advance in their ability to write human-like text, a key challenge remains around their tendency to hallucinate generating content that appears factual but is ungrounded. This issue of hallucination is arguably the biggest hindrance to safely deploying these powerful LLMs into real-world production systems that impact people's lives. The journey toward widespread adoption of LLMs in practical settings heavily relies on addressing and mitigating hallucinations. Unlike traditional AI systems focused on limited tasks, LLMs have been exposed to vast amounts of online text data during training. While this allows them to display impressive language fluency, it also means they are capable of extrapolating information from the biases in training data, misinterpreting ambiguous prompts, or modifying the information to align superficially with the input. This becomes hugely alarming when we rely on language generation capabilities for sensitive applications, such as summarizing medical records, financial analysis reports, etc. This paper presents a comprehensive survey of over 32 techniques developed to mitigate hallucination in LLMs. Notable among these are Retrieval Augmented Generation (Lewis et al, 2021), Knowledge Retrieval (Varshney et al,2023), CoNLI (Lei et al, 2023), and CoVe (Dhuliawala et al, 2023). Furthermore, we introduce a detailed taxonomy categorizing these methods based on various parameters, such as dataset utilization, common tasks, feedback mechanisms, and retriever types. This classification helps distinguish the diverse approaches specifically designed to tackle hallucination issues in LLMs. Additionally, we analyze the challenges and limitations inherent in these techniques, providing a solid foundation for future research in addressing hallucinations and related phenomena within the realm of LLMs.

翻译：随着大型语言模型（LLMs）在生成类人文本能力上的持续进步，一个关键挑战仍然存在：它们倾向于生成看似事实但缺乏依据的幻觉内容。这一幻觉问题可以说是将这些强大的LLMs安全部署到影响人们生活的实际生产系统中的最大障碍。LLMs在实际场景中的广泛采纳之路在很大程度上依赖于解决和缓解幻觉问题。与专注于有限任务的传统AI系统不同，LLMs在训练过程中接触了大量在线文本数据。虽然这使它们展现出令人印象深刻的语言流畅性，但也意味着它们能够从训练数据的偏差中外推信息、误解模糊提示，或修改信息以使其表面上与输入一致。当我们在敏感应用（如总结医疗记录、金融分析报告等）中依赖语言生成能力时，这一点变得极为令人担忧。本文对超过32种用于缓解LLMs幻觉的技术进行了全面综述。其中值得注意的方法包括检索增强生成（Lewis等人，2021）、知识检索（Varshney等人，2023）、CoNLI（Lei等人，2023）和CoVe（Dhuliawala等人，2023）。此外，我们基于数据集利用、常见任务、反馈机制和检索器类型等不同参数，引入了一个详细的分类体系来对这些方法进行分类。这种分类有助于区分专门设计用于解决LLMs幻觉问题的多样化方法。我们还分析了这些技术固有的挑战和局限性，为未来在LLMs领域解决幻觉及相关现象的研究提供了坚实基础。