Visuomotor policies, which learn control mechanisms directly from high-dimensional visual observations, confront challenges in adapting to new environments with intricate visual variations. Data augmentation emerges as a promising method for bridging these generalization gaps by enriching data variety. However, straightforwardly augmenting the entire observation shall impose excessive burdens on policy learning and may even result in performance degradation. In this paper, we propose to improve the generalization ability of visuomotor policies as well as preserve training stability from two aspects: 1) We learn a control-aware mask through a self-supervised reconstruction task with three auxiliary losses and then apply strong augmentation only to those control-irrelevant regions based on the mask to reduce the generalization gaps. 2) To address training instability issues prevalent in visual reinforcement learning (RL), we distill the knowledge from a pretrained RL expert processing low-level environment states, to the student visuomotor policy. The policy is subsequently deployed to unseen environments without any further finetuning. We conducted comparison and ablation studies across various benchmarks: the DMControl Generalization Benchmark (DMC-GB), the enhanced Robot Manipulation Distraction Benchmark (RMDB), and a specialized long-horizontal drawer-opening robotic task. The extensive experimental results well demonstrate the effectiveness of our method, e.g., showing a 17\% improvement over previous methods in the video-hard setting of DMC-GB.
翻译:视觉运动策略直接从高维视觉观测中学习控制机制,在适应具有复杂视觉变化的新环境时面临挑战。数据增强通过丰富数据多样性,成为弥合这些泛化差距的一种有前景的方法。然而,对整个观测进行简单增强会给策略学习带来过度负担,甚至可能导致性能下降。本文提出从两个方面提升视觉运动策略的泛化能力同时保持训练稳定性:1) 我们通过一个带有三个辅助损失的自监督重构任务学习控制感知掩码,然后仅对该掩码标识的控制无关区域施加强增强,以缩减泛化差距;2) 为缓解视觉强化学习中普遍存在的训练不稳定问题,我们将预训练的、处理低维环境状态的强化学习专家知识蒸馏给学生视觉运动策略。该策略随后可直接部署到未见环境中,无需任何微调。我们在多个基准测试上进行了对比与消融研究:DMControl泛化基准(DMC-GB)、增强版机器人操作干扰基准(RMDB)以及一项专门的长水平抽屉开启机器人任务。大量实验结果充分证明了我们方法的有效性,例如在DMC-GB的视频困难设置下,相比先前方法提升了17%的性能。