Divide and Conquer for Large Language Models Reasoning

Large language models (LLMs) have shown impressive performance in various reasoning benchmarks with the emergence of Chain-of-Thought (CoT) and its derivative methods, particularly in tasks involving multi-choice questions (MCQs). However, current works all process data uniformly without considering the problem-solving difficulty, which means an excessive focus on simple questions while insufficient to intricate ones. To address this challenge, we inspired by humans using heuristic strategies to categorize tasks and handle them individually, propose to apply the Divide and Conquer to LLMs reasoning. First, we divide questions into different subsets based on the statistical confidence score ($\mathcal{CS}$), then fix nearly resolved sets and conquer demanding nuanced process ones with elaborately designed methods, including Prior Knowledge based Reasoning (PKR) and Filter Choices based Reasoning (FCR), as well as their integration variants. Our experiments demonstrate that this proposed strategy significantly boosts the models' reasoning abilities across nine datasets involving arithmetic, commonsense, and logic tasks. For instance, compared to baseline, we make a striking improvement on low confidence subsets of 8.72\% for AQuA, 15.07\% for ARC Challenge and 7.71\% for RiddleSense. In addition, through extensive analysis on length of rationale and number of options, we verify that longer reasoning paths in PKR could prevent models from referring infer-harmful shortcuts, and also find that removing irrelevant choices in FCR would substantially avoid models' confusion. The code is at \url{https://github.com/AiMijie/Divide-and-Conquer}

翻译：大语言模型（LLMs）在各类推理基准测试中，尤其是在涉及多项选择题（MCQs）的任务中，借助思维链（CoT）及其衍生方法的出现，展现出了令人瞩目的性能。然而，当前研究工作均统一处理数据，未考虑问题求解难度，这意味着对简单问题过度关注，而对复杂问题投入不足。为应对这一挑战，受人类采用启发式策略分类任务并逐一处理的启发，我们提出将分治策略应用于LLMs推理。首先，基于统计置信度分数（$\mathcal{CS}$）将问题划分为不同子集；随后，对基本解决集进行修复，采用精心设计的方法攻克需精细处理的高难度子集，这些方法包括基于先验知识的推理（PKR）和基于滤除选项的推理（FCR）及其集成变体。实验表明，该策略在涵盖算术、常识和逻辑任务的九个数据集上显著提升了模型推理能力。例如，与基线相比，我们在AQuA、ARC Challenge和RiddleSense的低置信度子集上分别取得了8.72%、15.07%和7.71%的显著提升。此外，通过对推理链长度和选项数量的广泛分析，我们验证了PKR中更长的推理路径可防止模型依赖有害推理捷径，同时发现FCR中移除无关选项能有效避免模型混淆。代码见 \url{https://github.com/AiMijie/Divide-and-Conquer} 。