Large language models (LLMs) exhibit emergent behaviors suggestive of human-like reasoning. While recent work has identified structured conceptual representations within these models, it remains unclear whether they functionally rely on such representations for reasoning. Here we investigate the internal processing of LLMs during in-context inference across diverse tasks. Our results reveal a conceptual subspace emerging in middle to late layers, whose representational structure persists across contexts. Using causal mediation analyses, we demonstrate that this subspace is not merely an epiphenomenon but is functionally central to model predictions, establishing its causal role in inference. We further identify a layer-wise progression where attention heads in early-to-middle layers integrate contextual cues to construct and refine the subspace, which is subsequently leveraged by later layers to generate predictions. Together, these findings provide evidence that LLMs dynamically construct and use structured latent representations in context for inference, offering insights into the computational processes underlying flexible adaptation.
翻译:大语言模型展现出类似人类推理的涌现行为。尽管近期研究已识别出这些模型内部存在结构化概念表征,但其是否在推理过程中功能性地依赖此类表征仍不明确。本研究聚焦大语言模型在不同任务上下文推理中的内部处理机制,结果发现:在模型中间至深层存在一个概念子空间,其表征结构具有跨上下文持久性。通过因果中介分析,我们证明该子空间并非仅是伴随现象,而是对模型预测具有功能性核心作用,从而确立其在推理中的因果角色。进一步,我们揭示出逐层递进机制:浅层至中间层的注意力头通过整合上下文线索构建并优化该子空间,随后深层利用该子空间生成预测。综上,这些发现证明大语言模型在推理过程中动态构建并运用结构化潜在表征,为理解灵活适应背后的计算过程提供了新视角。