Recent Large Reasoning Models trained via reinforcement learning exhibit a "natural" alignment with human cognitive costs. However, we show that the prevailing paradigm of reasoning distillation -- training student models to mimic these traces via Supervised Fine-Tuning (SFT) -- fails to transmit this cognitive structure. Testing the "Hán Dān Xué Bù" (Superficial Mimicry) hypothesis across 14 models, we find that distillation induces a "Functional Alignment Collapse": while teacher models mirror human difficulty scaling ($\bar{r}=0.64$), distilled students significantly degrade this alignment ($\bar{r}=0.34$), often underperforming their own pre-distillation baselines ("Negative Transfer"). Our analysis suggests that SFT induces a "Cargo Cult" effect, where students ritualistically replicate the linguistic form of reasoning (verbosity) without internalizing the teacher's dynamic resource allocation policy. Consequently, reasoning distillation decouples computational cost from cognitive demand, revealing that human-like cognition is an emergent property of active reinforcement, not passive imitation.
翻译:近期通过强化学习训练的大型推理模型展现出与人类认知成本“自然”对齐的特性。然而,我们发现当前主流的推理蒸馏范式——通过监督式微调训练学生模型模仿这些推理轨迹——未能传递这种认知结构。通过测试14个模型上的“邯郸学步”(浅层模仿)假说,我们发现蒸馏会引发“功能对齐崩溃”:教师模型能反映人类难度扩展规律($\bar{r}=0.64$),而蒸馏后的学生模型显著削弱了这一对齐度($\bar{r}=0.34$),且常低于自身蒸馏前的基线表现(“负迁移”)。我们的分析表明,监督式微调会诱发“货物崇拜”效应——学生模型仪式化地复现推理的语言形式(冗长性),却未能内化教师的动态资源分配策略。由此,推理蒸馏将计算成本与认知需求解耦,揭示类人认知是主动强化的涌现特性,而非被动模仿的结果。