Large Language Models (LLMs) have demonstrated remarkable abilities in various language tasks, making them promising candidates for decision-making in robotics. Inspired by Hierarchical Reinforcement Learning (HRL), we propose Hierarchical in-Context Reinforcement Learning (HCRL), a novel framework that decomposes complex tasks into sub-tasks using an LLM-based high-level policy, in which a complex task is decomposed into sub-tasks by a high-level policy on-the-fly. The sub-tasks, defined by goals, are assigned to the low-level policy to complete. Once the LLM agent determines that the goal is finished, a new goal will be proposed. To improve the agent's performance in multi-episode execution, we propose Hindsight Modular Reflection (HMR), where, instead of reflecting on the full trajectory, we replace the task objective with intermediate goals and let the agent reflect on shorter trajectories to improve reflection efficiency. We evaluate the decision-making ability of the proposed HCRL in three benchmark environments--ALFWorld, Webshop, and HotpotQA. Results show that HCRL can achieve 9%, 42%, and 10% performance improvement in 5 episodes of execution over strong in-context learning baselines.
翻译:大型语言模型(LLM)在各种语言任务中展现出卓越能力,使其成为机器人决策领域的有力候选者。受层次化强化学习(HRL)启发,我们提出层次化上下文强化学习(HCRL)——一种新颖的框架,通过基于LLM的高层策略将复杂任务动态分解为子任务。这些由目标定义的子任务被分配给底层策略执行。当LLM智能体判定当前目标已完成时,将提出新的目标。为提升智能体在多轮次执行中的性能,我们提出后见模块化反思(HMR):通过将任务目标替换为中间目标,让智能体对更短的轨迹进行反思,从而提升反思效率,而非对完整轨迹进行反思。我们在三个基准环境(ALFWorld、Webshop和HotpotQA)中评估了所提HCRL的决策能力。结果表明,在5轮次执行中,HCRL相较于强大的上下文学习基线分别实现了9%、42%和10%的性能提升。