Many large language models (LLMs) use reasoning to generate responses but do not reveal their full reasoning traces (a.k.a. chains of thought), instead outputting only final answers and brief reasoning summaries. To demonstrate that hiding reasoning traces does not prevent users from "stealing" a model's reasoning capabilities, we introduce trace inversion models that, given only the inputs, answers, and (optionally) reasoning summaries exposed by a target model, generate detailed, synthetic reasoning traces. We show that (1) traces synthesized by trace inversion have high overlap with the ground-truth reasoning traces (when available), and (2) fine-tuning student models on inverted traces substantially improves their reasoning. For example, fine-tuning Qwen-2.5-7B-Instruct on traces inverted from the answers and summaries of GPT-5 mini, a commercial black-box LLM, improves its performance from 56.8% to 77.6% on MATH500 and from 11.7% to 42.3% on JEEBench, compared to fine-tuning on just the answers and summaries.
翻译:许多大型语言模型(LLM)在生成回答时会进行推理,但不会公开完整的推理轨迹(即思维链),而仅输出最终答案和简短的推理摘要。为证明隐藏推理轨迹并不能阻止用户“窃取”模型的推理能力,我们引入了轨迹反演模型:该模型仅根据目标模型公开的输入、答案及(可选的)推理摘要,即可生成详细的合成推理轨迹。实验表明:(1)轨迹反演生成的合成轨迹与真实推理轨迹(在可获得的情况下)具有高度重叠性;(2)基于反演轨迹对学生模型进行微调可显著提升其推理能力。例如,使用从商用黑盒LLM——GPT-5 mini的答案和摘要反演得到的轨迹对Qwen-2.5-7B-Instruct进行微调,相比仅使用答案和摘要微调,其在MATH500上的性能从56.8%提升至77.6%,在JEEBench上从11.7%提升至42.3%。