Grounding the instruction in the environment is a key step in solving language-guided goal-reaching reinforcement learning problems. In automated reinforcement learning, a key concern is to enhance the model's ability to generalize across various tasks and environments. In goal-reaching scenarios, the agent must comprehend the different parts of the instructions within the environmental context in order to complete the overall task successfully. In this work, we propose CAREL (Cross-modal Auxiliary REinforcement Learning) as a new framework to solve this problem using auxiliary loss functions inspired by video-text retrieval literature and a novel method called instruction tracking, which automatically keeps track of progress in an environment. The results of our experiments suggest superior sample efficiency and systematic generalization for this framework in multi-modal reinforcement learning problems. Our code base is available here.
翻译:在语言引导的目标达成强化学习问题中,将指令与环境进行关联是解决问题的关键步骤。在自动化强化学习中,一个核心关注点是提升模型在不同任务与环境间的泛化能力。在目标达成场景中,智能体必须在环境上下文中理解指令的不同部分,才能成功完成整体任务。在本工作中,我们提出CAREL(跨模态辅助强化学习)作为一个新框架,通过受视频-文本检索文献启发的辅助损失函数以及一种称为指令追踪的新方法来解决此问题,该方法能自动追踪环境中的任务进展。我们的实验结果表明,该框架在多模态强化学习问题上具有更优的样本效率和系统性泛化能力。我们的代码库已公开。