Measuring and Mitigating Post-hoc Rationalization in Reverse Chain-of-Thought Generation

Reverse Chain-of-Thought Generation (RCG) synthesizes reasoning traces from query-answer pairs, but runs the risk of producing post-hoc rationalizations: when models can see the answer during generation, the answer serves as a cognitive anchor that shapes the entire explanation. We formalize this phenomenon through a three-level measurement hierarchy: lexical, entropic, and probabilistic anchoring, each captures surface artifacts, entropy dynamics, and latent answer dependence, respectively. We analyze semantic suppression, the intuitive mitigation strategy that instructs models to ignore the answer, to find out its counterproduction: while it reduces lexical overlap, it paradoxically increases entropic and probabilistic anchoring. Drawing on Ironic Process Theory from cognitive psychology, we attribute this failure to active monitoring of the forbidden answer, which inadvertently deepens dependence on it. To break this cycle, we propose Structural Skeleton-guided Reasoning (SSR), a two-phase approach that first generates an answer-invariant functional skeleton structure, then uses this skeleton to guide full trace generation. By redirecting the information flow to structural planning rather than answer monitoring, SSR consistently reduces anchoring across all three levels. We further introduce Distilled SSR (SSR-D), which fine-tunes models on teacher-generated SSR traces to ensure reliable structural adherence. Experiments across open-ended reasoning benchmarks demonstrate that SSR-D achieves up to 10% improvement over suppression baselines while preserving out-of-distribution (OOD) generalization.

翻译：逆向思维链生成（RCG）通过查询-答案对合成推理轨迹，但存在产生事后合理化的风险：当模型在生成过程中能看到答案时，答案会作为认知锚点塑造整个解释过程。我们通过三级度量体系对这一现象进行形式化分析：词汇锚定、熵锚定与概率锚定，分别捕捉表面特征、熵动态特性及潜在答案依赖性。通过分析语义抑制（这种直观的缓解策略指示模型忽略答案），我们发现其会产生反效果：虽然减少了词汇重叠，却矛盾地增强了熵锚定与概率锚定。借鉴认知心理学中的反讽过程理论，我们将此失效归因于对禁忌答案的主动监控反而加深了对其的依赖。为打破此循环，我们提出结构骨架引导推理（SSR）——一种两阶段方法：首先生成答案不变的功能性骨架结构，随后利用该骨架引导完整轨迹生成。通过将信息流重定向至结构规划而非答案监控，SSR在三个层级上均持续降低了锚定效应。我们进一步提出蒸馏式SSR（SSR-D），通过对教师模型生成的SSR轨迹进行微调来确保可靠的结构遵循。在开放式推理基准上的实验表明，SSR-D相比抑制基线实现了高达10%的性能提升，同时保持了分布外（OOD）泛化能力。