Chain-of-Thought (CoT) prompting has significantly enhanced the reasoning abilities of large language models. However, recent studies have shown that models can still perform complex reasoning tasks even when the CoT is replaced with filler(hidden) characters (e.g., "..."), leaving open questions about how models internally process and represent reasoning steps. In this paper, we investigate methods to decode these hidden characters in transformer models trained with filler CoT sequences. By analyzing layer-wise representations using the logit lens method and examining token rankings, we demonstrate that the hidden characters can be recovered without loss of performance. Our findings provide insights into the internal mechanisms of transformer models and open avenues for improving interpretability and transparency in language model reasoning.
翻译:思维链(CoT)提示显著增强了大型语言模型的推理能力。然而,近期研究表明,即使将CoT替换为填充(隐藏)字符(例如"......"),模型仍能执行复杂推理任务,这引发了关于模型内部如何处理和表示推理步骤的开放性问题。本文研究了在训练时使用填充CoT序列的Transformer模型中解码这些隐藏字符的方法。通过应用对数透镜方法分析层级表示并检查词元排名,我们证明隐藏字符可以在不损失性能的情况下被恢复。我们的发现为理解Transformer模型的内部机制提供了见解,并为提升语言模型推理的可解释性与透明度开辟了新途径。