Large Language Models (LLMs) often encode whether a statement is true as a vector in their residual stream activations. These vectors, also known as truth vectors, have been studied in prior work, however how they change when context is introduced remains unexplored. We study this question by measuring (1) the directional change ($θ$) between the truth vectors with and without context and (2) the relative magnitude of the truth vectors upon adding context. Across four LLMs and four datasets, we find that (1) truth vectors are roughly orthogonal in early layers, converge in middle layers, and may stabilize or continue increasing in later layers; (2) adding context generally increases the truth vector magnitude, i.e., the separation between true and false representations in the activation space is amplified; (3) larger models distinguish relevant from irrelevant context mainly through directional change ($θ$), while smaller models show this distinction through magnitude differences. We also find that context conflicting with parametric knowledge produces larger geometric changes than parametrically aligned context. To the best of our knowledge, this is the first work that provides a geometric characterization of how context transforms the truth vector in the activation space of LLMs.
翻译:大型语言模型(LLM)通常将陈述是否为真编码为其残差流激活中的向量。这些向量(亦称真相向量)在先前研究中已被探讨,但其在引入语境后的变化机制尚未明确。本研究通过测量以下两个指标探讨该问题:(1)有语境与无语境下真相向量间的方向变化($θ$);(2)添加语境后真相向量的相对幅值。在四个LLM和四个数据集上的实验表明:(1)真相向量在早期层近似正交,在中间层趋于收敛,在深层可能稳定或持续增强;(2)添加语境通常会增加真相向量幅值,即激活空间中真假表征的分离度被放大;(3)较大模型主要通过方向变化($θ$)区分相关与无关语境,而较小模型则通过幅值差异体现这种区分。研究还发现,与参数化知识冲突的语境比参数对齐的语境会产生更大的几何变化。据我们所知,这是首个从几何角度系统刻画语境如何改变LLM激活空间中真相向量的研究。