Multi-step reasoning remains a key challenge for Large Language Models (LLMs), particularly in complex domains such as mathematics and creative writing. While recent approaches including ReAct, Reflexion, and Self-Refine improve reasoning through iterative refinement and reflection, they often lack structured exploration of alternative solution paths and persistent learning across problems. We propose ReTreVal (Reasoning Tree with Validation), a hybrid framework that integrates Tree-of-Thoughts exploration, self-refinement, LLM-based critique scoring, and reflexion memory to enable bounded and validated multi-step reasoning. ReTreVal constructs a structured reasoning tree with adaptive depth based on problem complexity, where each node undergoes iterative self-critique and refinement guided by explicit LLM-generated feedback. A dual validation mechanism evaluates reasoning quality, coherence, and correctness at each node while persistently storing insights from successful reasoning paths and failure patterns in a reflexion memory buffer, enabling cross-problem learning. Critique-based pruning retains only the top-k highest-scoring nodes at each level, controlling computational cost while preserving high-quality solution paths. We evaluate ReTreVal against ReAct, Reflexion, and Self-Refine across 500 mathematical problems and creative writing tasks using Qwen 2.5 7B as the underlying LLM, and demonstrate that ReTreVal consistently outperforms existing methods through its combination of structured exploration, critique-driven refinement, and cross-problem memory, making it particularly effective for tasks requiring exploratory reasoning, rigorous verification, and knowledge transfer.
翻译:多步推理仍然是大语言模型(LLM)面临的关键挑战,尤其在数学和创意写作等复杂领域。尽管包括ReAct、Reflexion和Self-Refine在内的近期方法通过迭代优化和反思改进了推理能力,但它们通常缺乏对替代解路径的结构化探索以及跨问题的持久学习能力。本文提出ReTreVal(带验证的推理树),这是一种融合思维树探索、自我优化、基于LLM的批判性评分以及反思记忆的混合框架,旨在实现有界且可验证的多步推理。ReTreVal根据问题复杂度构建具有自适应深度的结构化推理树,其中每个节点均通过显式LLM生成反馈指导的迭代自我批判与优化过程。双重验证机制在评估每个节点推理质量、连贯性与正确性的同时,将成功推理路径和失败模式的洞察持久存储在反思记忆缓冲区中,从而实现跨问题学习。基于批判的剪枝策略仅保留每层评分最高的前k个节点,在控制计算成本的同时保留高质量解路径。我们以Qwen 2.5 7B为基础LLM,在500道数学问题与创意写作任务上对ReTreVal与ReAct、Reflexion及Self-Refine进行对比评估,结果表明:通过结合结构化探索、批判驱动优化与跨问题记忆机制,ReTreVal持续超越现有方法,特别适用于需要探索性推理、严格验证和知识迁移的任务。