The emergence of discourse-like tokens such as "wait" and "therefore" in large language models (LLMs) has offered a unique window into their reasoning processes. However, systematic analyses of how such signals vary across training strategies and model scales remain lacking. In this paper, we analyze token-level signals through token probabilities across various models. We find that specific tokens strongly correlate with reasoning correctness, varying with training strategies while remaining stable across model scales. A closer look at the "wait" token in relation to answer probability demonstrates that models fine-tuned on small-scale datasets acquire reasoning ability through such signals but exploit them only partially. This work provides a systematic lens to observe and understand the dynamics of LLM reasoning.
翻译:大语言模型(LLM)中出现的“wait”、“therefore”等类话语词元,为理解其推理过程提供了一个独特的窗口。然而,目前仍缺乏关于此类信号如何随训练策略和模型规模变化的系统性分析。本文通过分析不同模型的词元概率来研究词元级信号。我们发现,特定词元与推理正确性高度相关,其相关性随训练策略变化,但在不同模型规模下保持稳定。进一步考察“wait”词元与答案概率的关系表明,在小规模数据集上微调的模型通过此类信号获得了推理能力,但仅部分利用了它们。这项工作为观察和理解LLM推理的动态过程提供了一个系统性的视角。