Large Language Models (LLMs) have achieved tremendous progress, yet they still often struggle with challenging reasoning problems. Current approaches address this challenge by sampling or searching detailed and low-level reasoning chains. However, these methods are still limited in their exploration capabilities, making it challenging for correct solutions to stand out in the huge solution space. In this work, we unleash LLMs' creative potential for exploring multiple diverse problem solving strategies by framing an LLM as a hierarchical policy via in-context learning. This policy comprises of a visionary leader that proposes multiple diverse high-level problem-solving tactics as hints, accompanied by a follower that executes detailed problem-solving processes following each of the high-level instruction. The follower uses each of the leader's directives as a guide and samples multiple reasoning chains to tackle the problem, generating a solution group for each leader proposal. Additionally, we propose an effective and efficient tournament-based approach to select among these explored solution groups to reach the final answer. Our approach produces meaningful and inspiring hints, enhances problem-solving strategy exploration, and improves the final answer accuracy on challenging problems in the MATH dataset. Code will be released at https://github.com/lz1oceani/LLM-As-Hierarchical-Policy.
翻译:大语言模型(LLMs)已取得巨大进展,但在处理复杂推理问题时仍常显不足。现有方法通过采样或搜索细粒度、低层级的推理链来解决这一挑战,但其探索能力依然有限,导致正确答案难以在庞大的解空间中脱颖而出。本研究通过将大语言模型构建为基于上下文学习的分层策略,释放其探索多样化问题求解策略的创造潜力。该策略包含一个提出多种差异化高层级解题策略作为提示的远见引导者,以及一个遵循每条高层级指令执行具体解题过程的跟随者。跟随者以引导者的每条指令为指南,采样多个推理链来解决问题,并为每个引导者方案生成一组候选解。此外,我们提出一种高效且有效的基于锦标赛的选择方法,从这些探索到的解组中筛选出最终答案。我们的方法能够生成富有意义且具启发性的提示,增强问题求解策略的探索能力,并在MATH数据集中的挑战性问题上提升了最终答案的准确性。代码将发布于https://github.com/lz1oceani/LLM-As-Hierarchical-Policy。