Recent studies have discovered that Chain-of-Thought prompting (CoT) can dramatically improve the performance of Large Language Models (LLMs), particularly when dealing with complex tasks involving mathematics or reasoning. Despite the enormous empirical success, the underlying mechanisms behind CoT and how it unlocks the potential of LLMs remain elusive. In this paper, we take a first step towards theoretically answering these questions. Specifically, we examine the expressivity of LLMs with CoT in solving fundamental mathematical and decision-making problems. We start by giving an impossibility result showing that bounded-depth Transformers are unable to directly produce correct answers for basic arithmetic/equation tasks unless the model size grows super-polynomially with respect to the input length. In contrast, we then prove by construction that autoregressive Transformers of constant size suffice to solve both tasks by generating CoT derivations using a commonly-used math language format. Moreover, we show LLMs with CoT are capable of solving a general class of decision-making problems known as Dynamic Programming, thus justifying its power in tackling complex real-world tasks. Finally, extensive experiments on four tasks show that, while Transformers always fail to predict the answers directly, they can consistently learn to generate correct solutions step-by-step given sufficient CoT demonstrations.
翻译:近期的研究发现,思维链提示(Chain-of-Thought prompting, CoT)能够显著提升大型语言模型(LLMs)的性能,尤其是在处理涉及数学或推理的复杂任务时。尽管在实证上取得了巨大成功,但CoT背后的内在机制以及它如何释放LLMs的潜力仍然难以捉摸。本文率先从理论上对这些问题进行探索。具体而言,我们研究了带有CoT的LLMs在解决基础数学和决策问题时的表达能力。我们首先给出一个不可能性结论:除非模型规模随输入长度超多项式增长,否则有界深度Transformer无法直接产生正确的算术/方程任务答案。相反,我们随后通过构造证明,恒定大小的自回归Transformer足以通过使用常用数学语言格式生成CoT推导来解决这两个任务。此外,我们展示了带有CoT的LLMs能够解决一类被称为动态规划的通用决策问题,从而证明了其在处理复杂现实任务中的能力。最后,在四个任务上的大量实验表明,尽管Transformer总是无法直接预测答案,但在给定足够多CoT示范的情况下,它们能够一致地学习逐步生成正确的解决方案。