Combining reinforcement learning with language grounding is challenging as the agent needs to explore the environment while simultaneously learning multiple language-conditioned tasks. To address this, we introduce a novel method: the compositionally-enabled reinforcement learning language agent (CERLLA). Our method reduces the sample complexity of tasks specified with language by leveraging compositional policy representations and a semantic parser trained using reinforcement learning and in-context learning. We evaluate our approach in an environment requiring function approximation and demonstrate compositional generalization to novel tasks. Our method significantly outperforms the previous best non-compositional baseline in terms of sample complexity on 162 tasks designed to test compositional generalization. Our model attains a higher success rate and learns in fewer steps than the non-compositional baseline. It reaches a success rate equal to an oracle policy's upper-bound performance of 92%. With the same number of environment steps, the baseline only reaches a success rate of 80%.
翻译:将强化学习与语言基础相结合具有挑战性,因为智能体需要在探索环境的同时学习多个语言条件任务。为此,我们提出了一种新方法:具备组合能力的强化学习语言智能体(CERLLA)。该方法通过利用组合式策略表示以及一个使用强化学习和上下文学习训练的语义解析器,降低了语言指定任务的样本复杂度。我们在一个需要函数逼近的环境中评估了所提方法,并展示了其对新颖任务的组合泛化能力。在专为测试组合泛化能力而设计的162项任务上,我们的方法在样本复杂度方面显著优于先前最佳的非组合基线模型。我们的模型获得了更高的成功率,且学习步数少于非组合基线。其成功率达到了与预言机策略上限性能持平的92%。在相同环境步数下,基线模型仅能达到80%的成功率。