Most existing prompting methods suffer from the issues of generalizability and consistency, as they often rely on instance-specific solutions that may not be applicable to other instances and lack task-level consistency across the selected few-shot examples. To address these limitations, we propose a comprehensive framework, StrategyLLM, allowing LLMs to perform inductive reasoning, deriving general strategies from specific task instances, and deductive reasoning, applying these general strategies to particular task examples, for constructing generalizable and consistent few-shot prompts. It employs four LLM-based agents: strategy generator, executor, optimizer, and evaluator, working together to generate, evaluate, and select promising strategies for a given task. Experimental results demonstrate that StrategyLLM outperforms the competitive baseline CoT-SC that requires human-annotated solutions on 13 datasets across 4 challenging tasks without human involvement, including math reasoning (34.2\% $\rightarrow$ 38.8\%), commonsense reasoning (70.3\% $\rightarrow$ 72.5\%), algorithmic reasoning (73.7\% $\rightarrow$ 85.0\%), and symbolic reasoning (30.0\% $\rightarrow$ 79.2\%). Further analysis reveals that StrategyLLM is applicable to various LLMs and demonstrates advantages across numerous scenarios.
翻译:现有的大多数提示方法存在泛化性与一致性问题,它们通常依赖于实例特定的解决方案,这些方案可能无法适用于其他实例,并且缺乏所选少样本示例之间的任务级一致性。为应对这些局限,我们提出了一个综合框架StrategyLLM,使LLM能够进行归纳推理(从特定任务实例中推导出通用策略)和演绎推理(将这些通用策略应用于具体任务示例),以构建可泛化且一致的少样本提示。该框架采用四个基于LLM的智能体:策略生成器、执行器、优化器和评估器,它们协同工作,为给定任务生成、评估并选择有前景的策略。实验结果表明,在无需人工参与的情况下,StrategyLLM在涵盖4项挑战性任务(包括数学推理(34.2% $\rightarrow$ 38.8%)、常识推理(70.3% $\rightarrow$ 72.5%)、算法推理(73.7% $\rightarrow$ 85.0%)和符号推理(30.0% $\rightarrow$ 79.2%))的13个数据集上,均优于需要人工标注解决方案的竞争基线CoT-SC。进一步分析表明,StrategyLLM适用于多种LLM,并在众多场景中展现出优势。