Natural Language-conditioned reinforcement learning (RL) enables the agents to follow human instructions. Previous approaches generally implemented language-conditioned RL by providing human instructions in natural language (NL) and training a following policy. In this outside-in approach, the policy needs to comprehend the NL and manage the task simultaneously. However, the unbounded NL examples often bring much extra complexity for solving concrete RL tasks, which can distract policy learning from completing the task. To ease the learning burden of the policy, we investigate an inside-out scheme for natural language-conditioned RL by developing a task language (TL) that is task-related and unique. The TL is used in RL to achieve highly efficient and effective policy training. Besides, a translator is trained to translate NL into TL. We implement this scheme as TALAR (TAsk Language with predicAte Representation) that learns multiple predicates to model object relationships as the TL. Experiments indicate that TALAR not only better comprehends NL instructions but also leads to a better instruction-following policy that improves 13.4% success rate and adapts to unseen expressions of NL instruction. The TL can also be an effective task abstraction, naturally compatible with hierarchical RL.
翻译:自然语言条件化的强化学习使智能体能够遵循人类指令。以往方法通常采用外部引导式方案,即通过提供自然语言形式的人类指令并训练跟随策略来实现语言条件化强化学习。在这种由外而内的模式中,策略需要同时理解自然语言并管理任务执行。然而,无限多样的自然语言示例往往为具体强化学习任务的解决引入额外复杂性,可能分散策略对任务完成的关注。为减轻策略的学习负担,我们提出了一种由内而外的语言条件化强化学习方案:开发与任务相关且具备唯一性的任务语言。该任务语言在强化学习中被用于实现高效且有效的策略训练。此外,我们还训练了一个将自然语言翻译为任务语言的翻译器。我们将该方案具体实现为TALAR(基于谓词表示的任务语言),通过学习多个谓词来建模对象间关系作为任务语言。实验表明,TALAR不仅能够更好地理解自然语言指令,还能生成更优的指令跟随策略,将成功率提升13.4%,并适应未见过的自然语言指令表达形式。该任务语言还可作为有效的任务抽象表示,天然兼容分层强化学习。