Large language models (LLMs) like ChatGPT demonstrate the remarkable progress of artificial intelligence. However, their tendency to hallucinate -- generate plausible but false information -- poses a significant challenge. This issue is critical, as seen in recent court cases where ChatGPT's use led to citations of non-existent legal rulings. This paper explores how Retrieval-Augmented Generation (RAG) can counter hallucinations by integrating external knowledge with prompts. We empirically evaluate RAG against standard LLMs using prompts designed to induce hallucinations. Our results show that RAG increases accuracy in some cases, but can still be misled when prompts directly contradict the model's pre-trained understanding. These findings highlight the complex nature of hallucinations and the need for more robust solutions to ensure LLM reliability in real-world applications. We offer practical recommendations for RAG deployment and discuss implications for the development of more trustworthy LLMs.
翻译:以ChatGPT为代表的大型语言模型(LLM)展现了人工智能领域的显著进步。然而,其产生幻觉的倾向——即生成看似合理但实则错误的信息——构成了重大挑战。这一问题至关重要,近期法庭案例中因使用ChatGPT而引用不存在的法律裁决便是有力佐证。本文探讨了检索增强生成(RAG)技术如何通过将外部知识与提示相结合来对抗幻觉现象。我们使用专门设计用于诱发幻觉的提示,对RAG与标准LLM进行了实证评估。结果表明,RAG在某些情况下能提高准确性,但当提示内容与模型预训练理解直接冲突时,仍可能被误导。这些发现揭示了幻觉现象的复杂性,并表明需要更稳健的解决方案来确保LLM在现实应用中的可靠性。我们为RAG的实际部署提供了实用建议,并讨论了其对开发更可信赖LLM的启示。