The use of multi-camera views simultaneously has been shown to improve the generalization capabilities and performance of visual policies. However, the hardware cost and design constraints in real-world scenarios can potentially make it challenging to use multiple cameras. In this study, we present a novel approach to enhance the generalization performance of vision-based Reinforcement Learning (RL) algorithms for robotic manipulation tasks. Our proposed method involves utilizing a technique known as knowledge distillation, in which a pre-trained ``teacher'' policy trained with multiple camera viewpoints guides a ``student'' policy in learning from a single camera viewpoint. To enhance the student policy's robustness against camera location perturbations, it is trained using data augmentation and extreme viewpoint changes. As a result, the student policy learns robust visual features that allow it to locate the object of interest accurately and consistently, regardless of the camera viewpoint. The efficacy and efficiency of the proposed method were evaluated both in simulation and real-world environments. The results demonstrate that the single-view visual student policy can successfully learn to grasp and lift a challenging object, which was not possible with a single-view policy alone. Furthermore, the student policy demonstrates zero-shot transfer capability, where it can successfully grasp and lift objects in real-world scenarios for unseen visual configurations.
翻译:同时使用多相机视角已被证明能提升视觉策略的泛化能力和性能。然而,实际场景中的硬件成本和设计限制可能使多相机使用面临挑战。本研究提出一种新颖方法,旨在增强基于视觉的强化学习算法在机器人操作任务中的泛化性能。该方法采用知识蒸馏技术:预训练的"教师"策略(使用多相机视角训练)引导"学生"策略从单一相机视角学习。为增强学生策略对相机位置扰动的鲁棒性,训练过程中引入数据增强和极端视角变化。由此,学生策略习得稳健的视觉特征,无论相机视角如何,均能准确稳定地定位目标物体。我们在仿真和真实环境中评估了该方法的有效性与效率。结果表明,单视角视觉学生策略能成功学习抓取并提起具有挑战性的物体(这是单纯单视角策略无法实现的)。此外,学生策略展现出零样本迁移能力——在未见过的视觉配置下,仍能在真实场景中成功完成抓取提起操作。