Reasoning language models, which generate long chains of thought, dramatically outperform non-reasoning language models on abstract problems. However, the internal model mechanisms that allow this superior performance remain poorly understood. We present a mechanistic analysis of how QwQ-32B - a model specifically trained to produce extensive reasoning traces - process abstract structural information. On Mystery Blocksworld - a semantically obfuscated planning domain - we find that QwQ-32B gradually improves its internal representation of actions and concepts during reasoning. The model develops abstract encodings that focus on structure rather than specific action names. Through steering experiments, we establish causal evidence that these adaptations improve problem solving: injecting refined representations from successful traces boosts accuracy, while symbolic representations can replace many obfuscated encodings with minimal performance loss. We find that one of the factors driving reasoning model performance is in-context refinement of token representations, which we dub Fluid Reasoning Representations.
翻译:生成长链思维过程的推理语言模型在抽象问题上的表现显著优于非推理语言模型。然而,导致这种卓越性能的内部模型机制仍鲜为人知。本文对QwQ-32B——一个专门训练用于生成详尽推理轨迹的模型——如何处理抽象结构信息进行了机制性分析。在语义混淆的规划领域Mystery Blocksworld上,我们发现QwQ-32B在推理过程中会逐步改进其对动作和概念的内部表征。该模型会发展出专注于结构而非具体动作名称的抽象编码。通过导向实验,我们获得了这些适应性改进能提升问题解决能力的因果证据:注入来自成功轨迹的精细化表征能提高准确率,而符号表征可以在最小化性能损失的情况下替代许多混淆编码。我们发现,驱动推理模型性能的因素之一是在上下文语境中对词元表征的精细化过程,我们将其称为动态推理表征。