Large Language Models (LLMs) have revolutionized various domains with extensive knowledge and creative capabilities. However, a critical issue with LLMs is their tendency to produce outputs that diverge from factual reality. This phenomenon is particularly concerning in sensitive applications such as medical consultation and legal advice, where accuracy is paramount. In this paper, we introduce the LLM factoscope, a novel Siamese network-based model that leverages the inner states of LLMs for factual detection. Our investigation reveals distinguishable patterns in LLMs' inner states when generating factual versus non-factual content. We demonstrate the LLM factoscope's effectiveness across various architectures, achieving over 96% accuracy in factual detection. Our work opens a new avenue for utilizing LLMs' inner states for factual detection and encourages further exploration into LLMs' inner workings for enhanced reliability and transparency.
翻译:大型语言模型(LLMs)凭借其广泛的知识和创造性能力,彻底改变了多个领域。然而,LLMs的一个关键问题是其输出容易偏离事实真相。这一现象在医疗咨询和法律建议等对准确性要求极高的敏感应用中尤为令人担忧。本文提出了一种基于孪生网络的新型模型——LLM事实分析仪,该模型利用LLMs的内部状态进行事实检测。我们的研究揭示,LLMs在生成事实与非事实内容时,其内部状态存在可区分的模式。我们证明了LLM事实分析仪在不同架构中的有效性,其在事实检测中的准确率超过96%。我们的工作为利用LLMs内部状态进行事实检测开辟了新途径,并鼓励进一步探索LLMs的内部运作机制,以提升其可靠性和透明度。