Chain-of-Thought and Program-Aided Language Models represent two distinct reasoning methods, each with its own strengths and weaknesses. We demonstrate that it is possible to combine the best of both worlds by using different models for different problems, employing a large language model (LLM) to perform model selection. Through a theoretical analysis, we discover that the performance improvement is determined by the differences between the combined methods and the success rate of choosing the correct model. On eight reasoning datasets, our proposed approach shows significant improvements. Furthermore, we achieve new state-of-the-art results on GSM8K and SVAMP with accuracies of 96.5% and 93.7%, respectively. Our code is publicly available at https://github.com/XuZhao0/Model-Selection-Reasoning.
翻译:链式思维与程序辅助语言模型代表了两种不同的推理方法,各自具有优势和劣势。我们证明,通过针对不同问题使用不同模型,并利用大语言模型进行模型选择,可以结合两者之长。通过理论分析,我们发现性能提升取决于所组合方法之间的差异以及选择正确模型的成功率。在八个推理数据集上,我们提出的方法显示出显著改进。此外,我们在GSM8K和SVAMP数据集上分别取得了96.5%和93.7%的最新最优准确率。我们的代码已公开于https://github.com/XuZhao0/Model-Selection-Reasoning。