Offline Meta Reinforcement Learning (OMRL) aims to learn transferable knowledge from offline datasets to enhance the learning process for new target tasks. Context-based Reinforcement Learning (RL) adopts a context encoder to expediently adapt the agent to new tasks by inferring the task representation, and then adjusting the policy based on this inferred representation. In this work, we focus on context-based OMRL, specifically on the challenge of learning task representation for OMRL. We conduct experiments that demonstrate that the context encoder trained on offline datasets might encounter distribution shift between the contexts used for training and testing. To overcome this problem, we present a hard-sampling-based strategy to train a robust task context encoder. Our experimental findings on diverse continuous control tasks reveal that utilizing our approach yields more robust task representations and better testing performance in terms of accumulated returns compared to baseline methods. Our code is available at https://github.com/ZJLAB-AMMI/HS-OMRL.
翻译:离线元强化学习旨在从离线数据集中学习可迁移知识,以增强新目标任务的学习过程。基于上下文的强化学习采用上下文编码器,通过推断任务表征并据此调整策略,使智能体能够快速适应新任务。本文聚焦于基于上下文的离线元强化学习,特别关注其中任务表征学习的挑战。我们通过实验证明,离线数据集训练的上下文编码器在训练和测试使用的上下文之间可能出现分布偏移问题。为解决该问题,我们提出了一种基于困难样本采样策略的鲁棒任务上下文编码器训练方法。在多个连续控制任务上的实验结果表明,与基线方法相比,我们的方法能够获得更鲁棒的任务表征,并在累积回报方面取得更优的测试性能。我们的代码已开源在https://github.com/ZJLAB-AMMI/HS-OMRL。