We introduce SELF-DISCOVER, a general framework for LLMs to self-discover the task-intrinsic reasoning structures to tackle complex reasoning problems that are challenging for typical prompting methods. Core to the framework is a self-discovery process where LLMs select multiple atomic reasoning modules such as critical thinking and step-by-step thinking, and compose them into an explicit reasoning structure for LLMs to follow during decoding. SELF-DISCOVER substantially improves GPT-4 and PaLM 2's performance on challenging reasoning benchmarks such as BigBench-Hard, grounded agent reasoning, and MATH, by as much as 32% compared to Chain of Thought (CoT). Furthermore, SELF-DISCOVER outperforms inference-intensive methods such as CoT-Self-Consistency by more than 20%, while requiring 10-40x fewer inference compute. Finally, we show that the self-discovered reasoning structures are universally applicable across model families: from PaLM 2-L to GPT-4, and from GPT-4 to Llama2, and share commonalities with human reasoning patterns.
翻译:我们提出SELF-DISCOVER,这是一个通用框架,使大语言模型能够自主发现任务固有的推理结构,以解决典型提示方法难以应对的复杂推理问题。该框架的核心是一个自我发现过程,其中大语言模型选择多个原子推理模块(如批判性思维和逐步推理),并将其组合成显式推理结构,供模型在解码过程中遵循。在具有挑战性的推理基准测试(如BigBench-Hard、基于智能体的推理和MATH)中,与思维链(CoT)相比,SELF-DISCOVER将GPT-4和PaLM 2的性能提升了高达32%。此外,SELF-DISCOVER的性能优于CoT自洽性等推理密集型方法超过20%,同时所需的推理计算量减少10-40倍。最后,我们证明自我发现的推理结构跨模型家族具有普适性:从PaLM 2-L到GPT-4,以及从GPT-4到Llama2,并与人类推理模式存在共同点。