Despite the great success of large language models (LLMs) in various tasks, they suffer from generating hallucinations. We introduce Truth Forest, a method that enhances truthfulness in LLMs by uncovering hidden truth representations using multi-dimensional orthogonal probes. Specifically, it creates multiple orthogonal bases for modeling truth by incorporating orthogonal constraints into the probes. Moreover, we introduce Random Peek, a systematic technique considering an extended range of positions within the sequence, reducing the gap between discerning and generating truth features in LLMs. By employing this approach, we improved the truthfulness of Llama-2-7B from 40.8\% to 74.5\% on TruthfulQA. Likewise, significant improvements are observed in fine-tuned models. We conducted a thorough analysis of truth features using probes. Our visualization results show that orthogonal probes capture complementary truth-related features, forming well-defined clusters that reveal the inherent structure of the dataset. Code: \url{https://github.com/jongjyh/trfr}
翻译:尽管大语言模型(LLMs)在各类任务中取得了巨大成功,但它们仍存在生成幻觉的问题。我们提出Truth Forest方法,通过使用多维正交探针揭示隐藏的真实性表征来增强LLMs的真实性。具体而言,该方法通过向探针引入正交约束,为真实性建模创建多个正交基。此外,我们引入系统性技术Random Peek,考虑序列中更广泛的位置范围,缩小LLMs中辨别与生成真实性特征之间的差距。通过采用该方法,我们将Llama-2-7B在TruthfulQA上的真实性从40.8%提升至74.5%。同样,在微调模型中也观察到显著改进。我们利用探针对真实性特征进行了深入分析。可视化结果表明,正交探针捕获了互补的真实性相关特征,形成定义明确的聚类,揭示了数据集的内在结构。代码:\url{https://github.com/jongjyh/trfr}