Reinforcement learning requires interaction with an environment, which is expensive for robots. This constraint necessitates approaches that work with limited environmental interaction by maximizing the reuse of previous experiences. We propose an approach that maximizes experience reuse while learning to solve a given task by generating and simultaneously learning useful auxiliary tasks. To generate these tasks, we construct an abstract temporal logic representation of the given task and leverage large language models to generate context-aware object embeddings that facilitate object replacements. Counterfactual reasoning and off-policy methods allow us to simultaneously learn these auxiliary tasks while solving the given target task. We combine these insights into a novel framework for multitask reinforcement learning and experimentally show that our generated auxiliary tasks share similar underlying exploration requirements as the given task, thereby maximizing the utility of directed exploration. Our approach allows agents to automatically learn additional useful policies without extra environment interaction.
翻译:强化学习需要与环境交互,这对机器人而言代价高昂。这一限制要求我们采用能够通过最大化先前经验复用来减少环境交互的方法。我们提出了一种方法,在解决给定任务的同时,通过生成并同步学习有用的辅助任务来最大化经验复用。为生成这些任务,我们构建了给定任务的抽象时序逻辑表示,并利用大语言模型生成上下文感知的对象嵌入,以支持对象替换。反事实推理与离策略方法使我们能够在解决目标任务的同时同步学习这些辅助任务。我们将这些见解整合为一个面向多任务强化学习的新框架,并通过实验证明,我们生成的辅助任务与给定任务具有相似的潜在探索需求,从而最大化定向探索的效用。我们的方法允许智能体在不增加环境交互的情况下自动学习额外的有用策略。