Large language models (LLMs) like ChatGPT demonstrate the remarkable progress of artificial intelligence. However, their tendency to hallucinate -- generate plausible but false information -- poses a significant challenge. This issue is critical, as seen in recent court cases where ChatGPT's use led to citations of non-existent legal rulings. This paper explores how Retrieval-Augmented Generation (RAG) can counter hallucinations by integrating external knowledge with prompts. We empirically evaluate RAG against standard LLMs using prompts designed to induce hallucinations. Our results show that RAG increases accuracy in some cases, but can still be misled when prompts directly contradict the model's pre-trained understanding. These findings highlight the complex nature of hallucinations and the need for more robust solutions to ensure LLM reliability in real-world applications. We offer practical recommendations for RAG deployment and discuss implications for the development of more trustworthy LLMs.
翻译:大语言模型(LLM)如ChatGPT展现了人工智能的显著进步。然而,其生成看似合理但虚假信息的"幻觉"倾向构成了重大挑战。这一问题至关重要,近期法庭案例显示,因使用ChatGPT而引用了并不存在的法律判决。本文探讨了检索增强生成(RAG)如何通过将外部知识融入提示来对抗幻觉。我们设计了诱发幻觉的提示,通过实证评估将RAG与标准LLM进行对比。结果表明,RAG在某些情况下提升了准确率,但当提示直接违背模型预训练认知时仍可能被误导。这些发现揭示了幻觉问题的复杂性,以及在现实应用中确保LLM可靠性需要更稳健的解决方案。我们为RAG部署提供了实用建议,并探讨了对开发更可信LLM的启示。