Large Language Models (LLMs) have demonstrated remarkable abilities in various language tasks, making them promising candidates for decision-making in robotics. Inspired by Hierarchical Reinforcement Learning (HRL), we propose Retrieval-Augmented in-context reinforcement Learning (RAHL), a novel framework that decomposes complex tasks into sub-tasks using an LLM-based high-level policy, in which a complex task is decomposed into sub-tasks by a high-level policy on-the-fly. The sub-tasks, defined by goals, are assigned to the low-level policy to complete. To improve the agent's performance in multi-episode execution, we propose Hindsight Modular Reflection (HMR), where, instead of reflecting on the full trajectory, we let the agent reflect on shorter sub-trajectories to improve reflection efficiency. We evaluated the decision-making ability of the proposed RAHL in three benchmark environments--ALFWorld, Webshop, and HotpotQA. The results show that RAHL can achieve an improvement in performance in 9%, 42%, and 10% in 5 episodes of execution in strong baselines. Furthermore, we also implemented RAHL on the Boston Dynamics SPOT robot. The experiment shows that the robot can scan the environment, find entrances, and navigate to new rooms controlled by the LLM policy.
翻译:大语言模型(LLMs)在各类语言任务中展现出卓越能力,使其成为机器人决策领域的有力候选者。受层次化强化学习(HRL)启发,我们提出检索增强的上下文强化学习(RAHL),这是一种新颖的框架,通过基于LLM的高层策略将复杂任务动态分解为子任务。这些由目标定义的子任务被分配给底层策略执行。为提升智能体在多轮次执行中的性能,我们提出后视模块化反思(HMR),该方法使智能体对较短的子轨迹而非完整轨迹进行反思,以提高反思效率。我们在三个基准环境——ALFWorld、Webshop和HotpotQA中评估了所提RAHL的决策能力。结果表明,在强基线方法上执行5个轮次时,RALL在三个环境中分别实现了9%、42%和10%的性能提升。此外,我们还在波士顿动力SPOT机器人上实现了RAHL。实验表明,该机器人能够扫描环境、识别入口,并在LLM策略控制下导航至新房间。