In this work, we address the challenge of zero-shot generalization (ZSG) in Reinforcement Learning (RL), where agents must adapt to entirely novel environments without additional training. We argue that understanding and utilizing contextual cues, such as the gravity level of the environment, is critical for robust generalization, and we propose to integrate the learning of context representations directly with policy learning. Our algorithm demonstrates improved generalization on various simulated domains, outperforming prior context-learning techniques in zero-shot settings. By jointly learning policy and context, our method acquires behavior-specific context representations, enabling adaptation to unseen environments and marks progress towards reinforcement learning systems that generalize across diverse real-world tasks. Our code and experiments are available at https://github.com/tidiane-camaret/contextual_rl_zero_shot.
翻译:本文针对强化学习(RL)中的零样本泛化(ZSG)挑战展开研究,该任务要求智能体在无需额外训练的情况下适应全新环境。我们认为,理解并利用环境上下文线索(如重力水平)对稳健泛化至关重要,并提出将上下文表征学习与策略学习直接整合的方案。我们的算法在多个模拟领域展现出更优的泛化性能,在零样本设置中优于先前的上下文学习技术。通过联合学习策略与上下文,该方法获得了行为特定的上下文表征,能够适应未见环境,标志着强化学习系统向跨多样化现实任务泛化的方向取得了进展。我们的代码和实验数据可在 https://github.com/tidiane-camaret/contextual_rl_zero_shot 获取。