Current reinforcement learning algorithms struggle in sparse and complex environments, most notably in long-horizon manipulation tasks entailing a plethora of different sequences. In this work, we propose the Intrinsically Guided Exploration from Large Language Models (IGE-LLMs) framework. By leveraging LLMs as an assistive intrinsic reward, IGE-LLMs guides the exploratory process in reinforcement learning to address intricate long-horizon with sparse rewards robotic manipulation tasks. We evaluate our framework and related intrinsic learning methods in an environment challenged with exploration, and a complex robotic manipulation task challenged by both exploration and long-horizons. Results show IGE-LLMs (i) exhibit notably higher performance over related intrinsic methods and the direct use of LLMs in decision-making, (ii) can be combined and complement existing learning methods highlighting its modularity, (iii) are fairly insensitive to different intrinsic scaling parameters, and (iv) maintain robustness against increased levels of uncertainty and horizons.
翻译:当前强化学习算法在稀疏奖励和复杂环境中表现不佳,尤其是在涉及大量不同序列的长期操作任务中。本文提出大型语言模型内在引导探索框架(IGE-LLMs)。通过将LLMs作为辅助内在奖励,IGE-LLMs引导强化学习中的探索过程,以解决具有稀疏奖励的复杂长期机器人操作任务。我们在面临探索挑战的环境以及同时面临探索和长期挑战的复杂机器人操作任务中评估了该框架及相关内在学习方法。结果表明IGE-LLMs:(i)在性能上显著优于相关内在方法及直接使用LLMs进行决策的方法;(ii)可与其他现有学习方法结合并形成互补,凸显其模块化特性;(iii)对不同内在缩放参数具有较好的不敏感性;(iv)在不确定性增加和长周期条件下保持鲁棒性。