The reliable deployment of deep reinforcement learning in real-world settings requires the ability to generalize across a variety of conditions, including both in-distribution scenarios seen during training as well as novel out-of-distribution scenarios. In this work, we present a framework for dynamics generalization in deep reinforcement learning that unifies these two distinct types of generalization within a single architecture. We introduce a robust adaptation module that provides a mechanism for identifying and reacting to both in-distribution and out-of-distribution environment dynamics, along with a joint training pipeline that combines the goals of in-distribution adaptation and out-of-distribution robustness. Our algorithm GRAM achieves strong generalization performance across in-distribution and out-of-distribution scenarios upon deployment, which we demonstrate on a variety of realistic simulated locomotion tasks with a quadruped robot.
翻译:深度强化学习在现实世界中的可靠部署需要具备在各种条件下进行泛化的能力,这些条件既包括训练期间见过的分布内场景,也包括新颖的分布外场景。在本研究中,我们提出了一种用于深度强化学习中动态泛化的框架,该框架将这两种不同类型的泛化统一在单一架构内。我们引入了一个鲁棒适应模块,该模块提供了识别并响应分布内与分布外环境动态的机制,同时结合了分布内适应与分布外鲁棒性目标的联合训练流程。我们的算法GRAM在部署时实现了跨越分布内与分布外场景的强大泛化性能,我们在一系列使用四足机器人的现实仿真运动任务中验证了这一点。