Large Language Models (LLMs) have achieved tremendous progress, yet they still often struggle with challenging reasoning problems. Current approaches address this challenge by sampling or searching detailed and low-level reasoning chains. However, these methods are still limited in their exploration capabilities, making it challenging for correct solutions to stand out in the huge solution space. In this work, we unleash LLMs' creative potential for exploring multiple diverse problem solving strategies by framing an LLM as a hierarchical policy via in-context learning. This policy comprises of a visionary leader that proposes multiple diverse high-level problem-solving tactics as hints, accompanied by a follower that executes detailed problem-solving processes following each of the high-level instruction. The follower uses each of the leader's directives as a guide and samples multiple reasoning chains to tackle the problem, generating a solution group for each leader proposal. Additionally, we propose an effective and efficient tournament-based approach to select among these explored solution groups to reach the final answer. Our approach produces meaningful and inspiring hints, enhances problem-solving strategy exploration, and improves the final answer accuracy on challenging problems in the MATH dataset. Code will be released at https://github.com/lz1oceani/LLM-As-Hierarchical-Policy.
翻译:大型语言模型(LLMs)取得了巨大进展,但在处理复杂推理问题时仍常显不足。现有方法通过采样或搜索详细且低级的推理链来应对这一挑战,但这些方法在探索能力上仍然有限,使得正确解决方案难以在海量解空间中脱颖而出。本研究通过将LLM作为基于上下文学习的分层策略,释放其探索多种多样问题解决策略的创造潜力。该策略包含一个愿景引领者,提出多种多样的高层次问题解决策略作为提示,并配备一个追随者,根据每条高层次指令执行详细的问题解决过程。追随者以引领者的每条指令为指导,采样多个推理链来解决问题,为每个引领者提议生成一个解组。此外,我们提出了一种高效且有效的基于锦标赛的方法,在这些探索出的解组中进行选择以得出最终答案。我们的方法能够产生有意义且富有启发性的提示,增强问题解决策略的探索能力,并提升MATH数据集上挑战性问题最终答案的准确性。代码将发布于https://github.com/lz1oceani/LLM-As-Hierarchical-Policy。