Recent studies have discovered that Chain-of-Thought prompting (CoT) can dramatically improve the performance of Large Language Models (LLMs), particularly when dealing with complex tasks involving mathematics or reasoning. Despite the enormous empirical success, the underlying mechanisms behind CoT and how it unlocks the potential of LLMs remain elusive. In this paper, we take a first step towards theoretically answering these questions. Specifically, we examine the capacity of LLMs with CoT in solving fundamental mathematical and decision-making problems. We start by giving an impossibility result showing that any bounded-depth Transformer cannot directly output correct answers for basic arithmetic/equation tasks unless the model size grows super-polynomially with respect to the input length. In contrast, we then prove by construction that autoregressive Transformers of a constant size suffice to solve both tasks by generating CoT derivations using a commonly-used math language format. Moreover, we show LLMs with CoT are capable of solving a general class of decision-making problems known as Dynamic Programming, thus justifying its power in tackling complex real-world tasks. Finally, extensive experiments on four tasks show that, while Transformers always fail to predict the answers directly, they can consistently learn to generate correct solutions step-by-step given sufficient CoT demonstrations.
翻译:近期研究发现了思维链提示(Chain-of-Thought prompting, CoT)能显著提升大型语言模型(LLMs)的性能,特别是在处理涉及数学或推理的复杂任务时。尽管取得了巨大的实证成功,但CoT背后的潜在机制以及它如何释放LLMs的潜力仍然难以捉摸。在本文中,我们首次尝试从理论上回答这些问题。具体而言,我们考察了配备CoT的LLMs在解决基础数学和决策问题上的能力。我们首先给出一个不可能性结果,表明任何有界深度的Transformer都无法直接输出基本算术/方程问题的正确答案,除非模型大小相对于输入长度呈超多项式增长。相比之下,我们随后通过构造证明,恒定大小的自回归Transformer足以通过使用常用数学语言格式生成CoT推导来解决这两类任务。此外,我们展示了配备CoT的LLMs能够解决一类被称为动态规划的通用决策问题,从而证实了其在处理复杂现实任务中的强大能力。最后,在四项任务上的广泛实验表明,尽管Transformer总是无法直接预测答案,但给定足够的CoT演示,它们能够一致地学习逐步生成正确的解决方案。