Recent studies have discovered that Chain-of-Thought prompting (CoT) can dramatically improve the performance of Large Language Models (LLMs), particularly when dealing with complex tasks involving mathematics or reasoning. Despite the enormous empirical success, the underlying mechanisms behind CoT and how it unlocks the potential of LLMs remain elusive. In this paper, we take a first step towards theoretically answering these questions. Specifically, we examine the expressivity of LLMs with CoT in solving fundamental mathematical and decision-making problems. By using circuit complexity theory, we first give impossibility results showing that bounded-depth Transformers are unable to directly produce correct answers for basic arithmetic/equation tasks unless the model size grows super-polynomially with respect to the input length. In contrast, we then prove by construction that autoregressive Transformers of constant size suffice to solve both tasks by generating CoT derivations using a commonly used math language format. Moreover, we show LLMs with CoT can handle a general class of decision-making problems known as Dynamic Programming, thus justifying its power in tackling complex real-world tasks. Finally, an extensive set of experiments show that, while Transformers always fail to directly predict the answers, they can consistently learn to generate correct solutions step-by-step given sufficient CoT demonstrations.
翻译:近期研究发现,链式思维提示(CoT)能显著提升大语言模型(LLMs)的性能,尤其在处理涉及数学或推理的复杂任务时。尽管取得了巨大的实证成功,CoT背后的潜在机制及其如何释放LLMs潜力的方式仍不明确。本文首次从理论上初步回答了这些问题。具体而言,我们考察了带有CoT的LLMs在求解基础数学与决策问题时的表达能力。利用电路复杂性理论,我们首先给出不可能性结果:除非模型规模随输入长度超多项式增长,否则有界深度Transformer无法直接生成基本算术/方程问题的正确答案。相比之下,我们通过构造证明:恒定规模的自回归Transformer通过使用常用数学语言格式生成CoT推导过程,即足以解决这两类任务。此外,我们证明带有CoT的LLMs能处理一类称为动态规划的通用决策问题,从而论证了其在处理复杂现实任务中的能力。最后,大量实验表明:虽然Transformer始终无法直接预测答案,但在提供足够CoT演示的条件下,它们能持续学会逐步生成正确求解过程。