Large language models (LLMs) have unveiled remarkable reasoning capabilities by exploiting chain-of-thought (CoT) prompting, which generates intermediate reasoning chains to serve as the rationale for deriving the answer. However, current CoT methods either simply employ general prompts such as Let's think step by step, or heavily rely on handcrafted task-specific demonstrations to attain preferable performances, thereby engendering an inescapable gap between performance and generalization. To bridge this gap, we propose Meta-CoT, a generalizable CoT prompting method in mixed-task scenarios where the type of input questions is unknown. Meta-CoT firstly categorizes the scenario based on the input question and subsequently constructs diverse demonstrations from the corresponding data pool in an automatic pattern. Meta-CoT simultaneously enjoys remarkable performances on ten public benchmark reasoning tasks and superior generalization capabilities. Notably, Meta-CoT achieves the state-of-the-art result on SVAMP (93.7%) without any additional program-aided methods. Our further experiments on five out-of-distribution datasets verify the stability and generality of Meta-CoT.
翻译:摘要:大规模语言模型(LLMs)通过利用思维链(Chain-of-Thought, CoT)提示方法,展现出卓越的推理能力。该方法通过生成中间推理链作为推导答案的依据。然而,当前CoT方法要么简单采用"让我们一步步思考"等通用提示,要么严重依赖手工制作的特定任务演示以获得较优性能,从而导致性能与泛化性之间存在难以逾越的鸿沟。为弥合这一差距,我们提出Meta-CoT——一种面向输入问题类型未知的混合任务场景的可泛化CoT提示方法。Meta-CoT首先基于输入问题对场景进行分类,随后以自动化模式从相应数据池中构建多样化演示。该方法在十个公开基准推理任务上兼具卓越性能与出色泛化能力。值得注意的是,Meta-CoT在不借助任何程序辅助方法的情况下,在SVAMP数据集上取得了当前最优结果(93.7%)。我们在五个分布外数据集上的进一步实验验证了Meta-CoT的稳定性和通用性。