Recent studies have discovered that Chain-of-Thought prompting (CoT) can dramatically improve the performance of Large Language Models (LLMs), particularly when dealing with complex tasks involving mathematics or reasoning. Despite the enormous empirical success, the underlying mechanisms behind CoT and how it unlocks the potential of LLMs remain elusive. In this paper, we take a first step towards theoretically answering these questions. Specifically, we examine the expressivity of LLMs with CoT in solving fundamental mathematical and decision-making problems. We start by giving an impossibility result showing that bounded-depth Transformers are unable to directly produce correct answers for basic arithmetic/equation tasks unless the model size grows super-polynomially with respect to the input length. In contrast, we then prove by construction that autoregressive Transformers of constant size suffice to solve both tasks by generating CoT derivations using a commonly-used math language format. Moreover, we show LLMs with CoT are capable of solving a general class of decision-making problems known as Dynamic Programming, thus justifying its power in tackling complex real-world tasks. Finally, extensive experiments on four tasks show that, while Transformers always fail to predict the answers directly, they can consistently learn to generate correct solutions step-by-step given sufficient CoT demonstrations.
翻译:近期研究发现,思维链提示(CoT)能够显著提升大型语言模型(LLMs)的性能,尤其是在处理涉及数学或推理的复杂任务时。尽管取得了巨大的实证成功,但CoT背后的内在机制以及它如何释放LLMs的潜力仍然难以捉摸。本文旨在从理论上初步解答这些问题。具体而言,我们考察了采用CoT的LLMs在解决基础数学与决策制定问题时的表达能力。我们首先给出一个不可能性结果:除非模型规模随输入长度呈超多项式增长,否则有界深度Transformer无法直接为基本算术/方程任务生成正确答案。相反,我们通过构造证明,恒定大小的自回归Transformer通过使用一种常用的数学语言格式生成CoT推导过程,就足以解决这两类任务。此外,我们证明,具备CoT能力的LLMs能够解决一类被称为动态规划的通用决策问题,从而论证了其在处理复杂现实任务中的强大能力。最后,在四项任务上的大量实验表明,尽管Transformer总是无法直接预测答案,但在获得足够的CoT演示后,它们能够持续地学习逐步生成正确的解决方案。