Generating Chain-of-Thought (CoT) before deriving the answer can effectively improve the reasoning capabilities of large language models (LLMs) and significantly improve the accuracy of the generated answer. However, in most cases, the length of the generated CoT is much longer than the desired final answer, which results in additional decoding costs. Furthermore, existing research has discovered that shortening the reasoning steps in CoT, even while preserving the key information, diminishes LLMs' abilities. These phenomena make it difficult to use LLMs and CoT in many real-world applications that only require the final answer and are sensitive to latency, such as search and recommendation. To reduce the costs of model decoding and shorten the length of the generated CoT, this paper presents $\textbf{C}$onditioned $\textbf{C}$ompressed $\textbf{C}$hain-of-$\textbf{T}$hought (C3oT), a CoT compression framework that involves a compressor to compress an original longer CoT into a shorter CoT while maintaining key information and interpretability, a conditioned training method to train LLMs with both longer CoT and shorter CoT simultaneously to learn the corresponding relationships between them, and a conditioned inference method to gain the reasoning ability learned from longer CoT by generating shorter CoT. We conduct experiments over four datasets from arithmetic and commonsense scenarios, showing that the proposed method is capable of compressing the length of generated CoT by up to more than 50% without compromising its effectiveness.
翻译:在推导答案之前生成思维链(CoT)能有效提升大语言模型(LLMs)的推理能力,并显著提高生成答案的准确性。然而在多数情况下,生成的思维链长度远超过最终所需答案的长度,从而导致额外的解码成本。此外,现有研究发现,即使保留关键信息,缩短思维链中的推理步骤也会削弱大语言模型的能力。这些现象使得大语言模型与思维链难以应用于许多仅需最终答案且对延迟敏感的实际场景(如搜索与推荐)。为降低模型解码成本并缩短生成的思维链长度,本文提出**条件压缩思维链**(C3oT),该思维链压缩框架包含三个核心组件:一个将原始较长思维链压缩为更短思维链同时保持关键信息与可解释性的压缩器;一种条件训练方法,通过同时使用较长与较短思维链训练大语言模型以学习其间的对应关系;以及一种条件推理方法,通过生成较短思维链来获取从较长思维链中学到的推理能力。我们在算术与常识推理场景的四个数据集上进行实验,结果表明所提方法能够将生成的思维链长度压缩超过50%而不影响其有效性。