Despite recent advances in large language models, open-source models often struggle to consistently perform well on complex reasoning tasks. Existing ensemble methods, whether applied at the token or output levels, fail to address these challenges. In response, we present Language model Ensemble with Monte Carlo Tree Search (LE-MCTS), a novel framework for process-level ensembling of language models. LE-MCTS formulates step-by-step reasoning with an ensemble of language models as a Markov decision process. In this framework, states represent intermediate reasoning paths, while actions consist of generating the next reasoning step using one of the language models selected from a predefined pool. Guided by a process-based reward model, LE-MCTS performs a tree search over the reasoning steps generated by different language models, identifying the most accurate reasoning chain. Experimental results on five mathematical reasoning benchmarks demonstrate that our approach outperforms both single language model decoding algorithms and language model ensemble methods. Notably, LE-MCTS improves performance by 3.6% and 4.3% on the MATH and MQA datasets, respectively, highlighting its effectiveness in solving complex reasoning problems.
翻译:尽管大语言模型近期取得了进展,开源模型在复杂推理任务上的表现仍常不稳定。现有的集成方法,无论是在词元层面还是输出层面,均未能有效应对这些挑战。为此,我们提出了一种新颖的语言模型过程级集成框架——基于蒙特卡洛树搜索的语言模型集成方法。该方法将使用语言模型集合进行逐步推理的过程形式化为一个马尔可夫决策过程。在此框架中,状态表示中间推理路径,而动作则涉及从预定义模型池中选择一个语言模型来生成下一个推理步骤。在基于过程的奖励模型引导下,对由不同语言模型生成的推理步骤执行树搜索,从而识别出最准确的推理链。在五个数学推理基准测试上的实验结果表明,我们的方法优于单一语言模型解码算法和语言模型集成方法。值得注意的是,在MATH和MQA数据集上分别实现了3.6%和4.3%的性能提升,突显了其在解决复杂推理问题上的有效性。