Domain Randomization (DR) is commonly used for sim2real transfer of reinforcement learning (RL) policies in robotics. Most DR approaches require a simulator with a fixed set of tunable parameters from the start of the training, from which the parameters are randomized simultaneously to train a robust model for use in the real world. However, the combined randomization of many parameters increases the task difficulty and might result in sub-optimal policies. To address this problem and to provide a more flexible training process, we propose Continual Domain Randomization (CDR) for RL that combines domain randomization with continual learning to enable sequential training in simulation on a subset of randomization parameters at a time. Starting from a model trained in a non-randomized simulation where the task is easier to solve, the model is trained on a sequence of randomizations, and continual learning is employed to remember the effects of previous randomizations. Our robotic reaching and grasping tasks experiments show that the model trained in this fashion learns effectively in simulation and performs robustly on the real robot while matching or outperforming baselines that employ combined randomization or sequential randomization without continual learning. Our code and videos are available at https://continual-dr.github.io/.
翻译:领域随机化(Domain Randomization,DR)常用于机器人强化学习(RL)的仿真到现实迁移。大多数DR方法要求模拟器在训练开始时配备一组固定的可调参数,并同时对这组参数进行随机化,从而训练出能在真实世界中鲁棒运行的模型。然而,对大量参数进行联合随机化会提升任务难度,可能导致策略非最优。为解决该问题并提供更灵活的训练过程,我们提出了面向RL的持续领域随机化(Continual Domain Randomization,CDR),该方法将领域随机化与持续学习相结合,使得能够在仿真中每次仅对随机化参数子集进行顺序训练。从在非随机化仿真环境中训练出的模型开始(该环境中任务更易求解),模型依次在随机化序列上进行训练,并采用持续学习来记忆先前随机化的效果。我们在机器人抓取和伸展任务上的实验表明,以这种方式训练的模型在仿真中学习高效,在真实机器人上执行鲁棒,其性能与采用联合随机化或无持续学习的顺序随机化的基线方法相当或更优。我们的代码和视频详见https://continual-dr.github.io/。