Large language models (LLMs) exhibit emergent behaviors suggestive of human-like reasoning. While recent work has identified structured, human-like conceptual representations within these models, it remains unclear whether they functionally rely on such representations for reasoning. Here we investigate the internal processing of LLMs during in-context concept inference. Our results reveal a conceptual subspace emerging in middle to late layers, whose representational structure persists across contexts. Using causal mediation analyses, we demonstrate that this subspace is not merely an epiphenomenon but is functionally central to model predictions, establishing its causal role in inference. We further identify a layer-wise progression where attention heads in early-to-middle layers integrate contextual cues to construct and refine the subspace, which is subsequently leveraged by later layers to generate predictions. Together, these findings provide evidence that LLMs dynamically construct and use structured, latent representations in context for inference, offering insights into the computational processes underlying flexible adaptation.
翻译:大语言模型(LLMs)展现出具有类人推理特征的涌现行为。尽管近期研究已识别出这些模型内部存在结构化、类人的概念表征,但这些表征是否在功能上支撑模型的推理过程仍不明确。本研究探究了LLMs在上下文概念推理过程中的内部处理机制。结果表明,在模型的中层至深层网络中出现了一个概念子空间,其表征结构在不同上下文中保持稳定。通过因果中介分析,我们证明该子空间并非伴随现象,而是对模型预测具有核心功能作用,确立了其在推理过程中的因果性角色。我们进一步识别出层级递进的处理模式:中前层的注意力头通过整合上下文线索来构建并优化该子空间,随后深层网络利用这一子空间生成预测。这些发现共同证明,LLMs能够在上下文中动态构建并利用结构化潜在表征进行推理,为理解其灵活适应的计算过程提供了新的见解。