We propose a new Verbal Reinforcement Learning (VRL) framework for interpretable task-level planning in mobile robotic systems operating under execution uncertainty. The framework follows a closed-loop architecture that enables iterative policy improvement through interaction with the physical environment. In our framework, executable Behavior Trees are repeatedly refined by a Large Language Model actor using structured natural-language feedback produced by a Vision-Language Model critic that observes the physical robot and execution traces. Unlike conventional reinforcement learning, policy updates in VRL occur directly at the symbolic planning level, without gradient-based optimization. This enables transparent reasoning, explicit causal feedback, and human-interpretable policy evolution. We validate the proposed framework on a real mobile robot performing a multi-stage manipulation and navigation task under execution uncertainty. Experimental results show that the framework supports explainable policy improvements, closed-loop adaptation to execution failures, and reliable deployment on physical robotic systems.
翻译:我们提出了一种新的语言强化学习(VRL)框架,用于在存在执行不确定性的移动机器人系统中进行可解释的任务级规划。该框架采用闭环架构,通过与物理环境的交互实现策略的迭代改进。在框架中,可执行的行为树由一个大型语言模型智能体反复优化,该优化基于视觉语言模型评价器观察物理机器人和执行轨迹后生成的结构化自然语言反馈。与传统强化学习不同,VRL中的策略更新直接在符号规划层面进行,无需基于梯度的优化。这使得推理过程透明、因果反馈明确,且策略演进具有人类可解释性。我们在真实移动机器人上验证了所提框架,该机器人在执行不确定性下完成多阶段操作与导航任务。实验结果表明,该框架支持可解释的策略改进、对执行失败的闭环适应,以及在物理机器人系统上的可靠部署。