Monte Carlo Tree Search (MCTS) has emerged as a powerful tool for decision-making in robotics, enabling efficient exploration of large search spaces. However, traditional MCTS methods struggle in environments characterized by high uncertainty and noisy data due to their reliance on final-step reward evaluation. The lack of intermediate feedback during search often results in suboptimal decision-making and computational inefficiencies. This paper introduces Reward-Centered ReST-MCTS, a novel framework that enhances MCTS by incorporating intermediate reward shaping. The core of our approach is the Rewarding Center, which refines search trajectories by dynamically assigning partial rewards using rule-based validation, heuristic guidance, and neural estimation. By integrating these mechanisms, our method enables real-time optimization of search paths, mitigating the effects of error propagation. We evaluate Reward-Centered ReST-MCTS in robotic manipulation tasks under high uncertainty, demonstrating consistent improvements in decision accuracy. Compared to baseline methods, including Chain-of-Thought (CoT) prompting and Vanilla ReST-MCTS, our framework achieves a 2-4% accuracy improvement while maintaining computational feasibility. Ablation studies confirm the effectiveness of intermediate feedback in search refinement, particularly in pruning incorrect decision paths early. Furthermore, robustness tests show that our method retains high performance across varying levels of uncertainty.
翻译:蒙特卡洛树搜索(MCTS)已成为机器人决策的强大工具,能够高效探索大型搜索空间。然而,传统MCTS方法因其依赖最终步骤的奖励评估,在具有高度不确定性和噪声数据的环境中表现不佳。搜索过程中缺乏中间反馈往往导致次优决策和计算效率低下。本文提出基于奖励中心的ReST-MCTS,这是一种通过融入中间奖励塑造来增强MCTS的新型框架。我们方法的核心是奖励中心,它通过基于规则的验证、启发式引导和神经估计动态分配部分奖励,从而优化搜索轨迹。通过整合这些机制,我们的方法实现了搜索路径的实时优化,减轻了误差传播的影响。我们在高不确定性下的机器人操作任务中评估了基于奖励中心的ReST-MCTS,证明了其在决策准确性上的持续改进。与包括思维链(CoT)提示和原始ReST-MCTS在内的基线方法相比,我们的框架在保持计算可行性的同时实现了2-4%的准确率提升。消融研究证实了中间反馈在搜索优化中的有效性,特别是在早期剪除错误决策路径方面。此外,鲁棒性测试表明,我们的方法在不同程度的不确定性下均能保持高性能。