Recent Large Reasoning Models trained via reinforcement learning exhibit a "natural" alignment with human cognitive costs. However, we show that the prevailing paradigm of reasoning distillation -- training student models to mimic these traces via Supervised Fine-Tuning (SFT) -- fails to transmit this cognitive structure. Testing the "Hán Dān Xué Bù" (Superficial Mimicry) hypothesis across 14 models, we find that distillation induces a "Functional Alignment Collapse": while teacher models mirror human difficulty scaling ($\bar{r}=0.64$), distilled students significantly degrade this alignment ($\bar{r}=0.34$), often underperforming their own pre-distillation baselines ("Negative Transfer"). Our analysis suggests that SFT induces a "Cargo Cult" effect, where students ritualistically replicate the linguistic form of reasoning (verbosity) without internalizing the teacher's dynamic resource allocation policy. Consequently, reasoning distillation decouples computational cost from cognitive demand, revealing that human-like cognition is an emergent property of active reinforcement, not passive imitation.
翻译:近期通过强化学习训练的大型推理模型展现出与人类认知成本的“自然”对齐。然而,我们发现当前主流的推理蒸馏范式——通过监督微调训练学生模型模仿这些推理轨迹——未能传递这种认知结构。在14个模型上检验“邯郸学步”(表面模仿)假说后,我们发现蒸馏引发了“功能对齐崩溃”:虽然教师模型能反映人类难度缩放规律($\bar{r}=0.64$),但蒸馏学生模型显著削弱了这种对齐($\bar{r}=0.34$),其表现甚至常常低于蒸馏前的基线水平(“负迁移”)。分析表明,监督微调会引发“货物崇拜”效应:学生模型仪式化地复现推理的语言形式(冗长表达),却未能内化教师模型的动态资源分配策略。因此,推理蒸馏使计算成本与认知需求脱钩,揭示出类人认知是通过主动强化而非被动模仿涌现的特性。