Large Language Models (LLMs) update their behavior in context, which can be viewed as a form of Bayesian inference. However, the structure of the latent hypothesis space over which this inference operates remains unclear. In this work, we propose that LLMs assign beliefs over a low-dimensional geometric space - a conceptual belief space - and that in-context learning corresponds to a trajectory through this space as beliefs are updated over time. Using story understanding as a natural setting for dynamic belief updating, we combine behavioral and representational analyses to study these trajectories. We find that (1) belief updates are well-described as trajectories on low-dimensional, structured manifolds; (2) this structure is reflected consistently in both model behavior and internal representations and can be decoded with simple linear probes to predict behavior; and (3) interventions on these representations causally steer belief trajectories, with effects that can be predicted from the geometry of the conceptual space. Together, our results provide a geometric account of belief dynamics in LLMs, grounding Bayesian interpretations of in-context learning in structured conceptual representations.
翻译:大型语言模型(LLMs)能在上下文中动态调整其行为,这可以视为一种贝叶斯推断过程。然而,这一推断过程所依赖的潜在假设空间的结构仍不明确。本文提出,LLMs将信念分配到一个低维几何空间——概念信念空间中,而上下文学习则对应随时间更新信念时在这个空间中的运动轨迹。我们以故事理解这一自然情境作为动态信念更新的载体,结合行为分析与表征分析来研究这些轨迹。研究发现:(1)信念更新可被良好地描述为低维结构化流形上的轨迹;(2)这种结构在模型行为与内部表征中一致地体现,并可通过简单的线性探针进行解码以预测行为;(3)对这些表征的干预能够因果性地操控信念轨迹,其效果可依据概念空间的几何结构进行预测。综合来看,我们的研究结果为LLM中的信念动态提供了几何学解释,将上下文学习的贝叶斯解释扎根于结构化的概念表征之中。