Large language models (LLMs) like ChatGPT demonstrate the remarkable progress of artificial intelligence. However, their tendency to hallucinate -- generate plausible but false information -- poses a significant challenge. This issue is critical, as seen in recent court cases where ChatGPT's use led to citations of non-existent legal rulings. This paper explores how Retrieval-Augmented Generation (RAG) can counter hallucinations by integrating external knowledge with prompts. We empirically evaluate RAG against standard LLMs using prompts designed to induce hallucinations. Our results show that RAG increases accuracy in some cases, but can still be misled when prompts directly contradict the model's pre-trained understanding. These findings highlight the complex nature of hallucinations and the need for more robust solutions to ensure LLM reliability in real-world applications. We offer practical recommendations for RAG deployment and discuss implications for the development of more trustworthy LLMs.
翻译:大型语言模型(LLMs)如ChatGPT展现了人工智能的显著进步。然而,它们易产生"幻觉"——即生成看似合理但虚假的信息——这一倾向构成了重大挑战。这一问题尤为严峻,近期多个法庭案例显示,使用ChatGPT导致引用了并不存在的法律裁决。本文探讨了检索增强生成(RAG)如何通过将外部知识与提示词相结合来对抗幻觉。我们采用特意诱导幻觉的提示词,通过实证比较RAG与标准LLMs的表现。结果表明,RAG在某些情况下提升了准确性,但当提示词直接违背模型预先训练的理解时,仍可能产生误导。这些发现揭示了幻觉问题的复杂性,并强调需开发更稳健的解决方案以确保LLMs在实际应用中的可靠性。我们为RAG部署提供了实用建议,并讨论了其对开发更值得信赖的LLMs的启示。