Chain-of-thought (CoT) decoding enables language models to improve reasoning performance at the cost of high generation latency in decoding. Recent proposals have explored variants of contemplation tokens, a term we introduce that refers to special tokens used during inference to allow for extra computation. Prior work has considered fixed-length sequences drawn from a discrete set of embeddings as contemplation tokens. Here we propose Compressed Chain-of-Thought (CCoT), a framework to generate contentful and continuous contemplation tokens of variable sequence length. The generated contemplation tokens are compressed representations of explicit reasoning chains, and our method can be applied to off-the-shelf decoder language models. Through experiments, we illustrate how CCoT enables additional reasoning over dense contentful representations to achieve corresponding improvements in accuracy. Moreover, the reasoning improvements can be adaptively modified on demand by controlling the number of contemplation tokens generated.
翻译:思维链(CoT)解码使语言模型能够提升推理性能,但代价是解码过程中的高生成延迟。近期研究探索了沉思令牌的变体——我们引入这一术语指代推理过程中用于额外计算的特殊令牌。先前工作主要考虑从离散嵌入集合中抽取固定长度序列作为沉思令牌。本文提出压缩思维链(CCoT)框架,用于生成具有可变序列长度且内容丰富的连续沉思令牌。生成的沉思令牌是显式推理链的压缩表示,该方法可直接应用于现成的解码器语言模型。通过实验,我们展示了CCoT如何通过对密集内容表示进行额外推理来实现相应的精度提升。此外,通过控制生成的沉思令牌数量,可以按需自适应调整推理改进程度。