Large language models (LLMs) exhibit emergent behaviors suggestive of human-like reasoning. While recent work has identified structured, human-like conceptual representations within these models, it remains unclear whether they functionally rely on such representations for reasoning. Here we investigate the internal processing of LLMs during in-context concept inference. Our results reveal a conceptual subspace emerging in middle to late layers, whose representational structure persists across contexts. Using causal mediation analyses, we demonstrate that this subspace is not merely an epiphenomenon but is functionally central to model predictions, establishing its causal role in inference. We further identify a layer-wise progression where attention heads in early-to-middle layers integrate contextual cues to construct and refine the subspace, which is subsequently leveraged by later layers to generate predictions. Together, these findings provide evidence that LLMs dynamically construct and use structured, latent representations in context for inference, offering insights into the computational processes underlying flexible adaptation.
翻译:大语言模型(LLMs)展现出具有类人推理特征的涌现行为。尽管近期研究已识别出这些模型中存在结构化、类人的概念表征,但模型是否在功能上依赖此类表征进行推理仍不明确。本研究探究了LLMs在上下文概念推理过程中的内部处理机制。结果表明,一个概念子空间在中层至深层网络中涌现,其表征结构在不同语境中保持稳定。通过因果中介分析,我们证明该子空间并非伴随现象,而是对模型预测具有功能性核心作用,确立了其在推理过程中的因果角色。我们进一步识别出层级递进过程:中前层的注意力头整合上下文线索以构建并优化该子空间,后续层则利用此子空间生成预测。这些发现共同证明,LLMs在推理过程中会动态构建并利用上下文中潜在的结构化表征,为理解其灵活适应的计算机制提供了新见解。