This paper focuses on the scalable robot learning for manipulation in the dexterous robot arm-hand systems, where the remote human-robot interactions via augmented reality (AR) are established to collect the expert demonstration data for improving efficiency. In such a system, we present a unified framework to address the general manipulation task problem. Specifically, the proposed method consists of two phases: i) In the first phase for pretraining, the policy is created in a behavior cloning (BC) manner, through leveraging the learning data from our AR-based remote human-robot interaction system; ii) In the second phase, a contrastive learning empowered reinforcement learning (RL) method is developed to obtain more efficient and robust policy than the BC, and thus a projection head is designed to accelerate the learning progress. An event-driven augmented reward is adopted for enhancing the safety. To validate the proposed method, both the physics simulations via PyBullet and real-world experiments are carried out. The results demonstrate that compared to the classic proximal policy optimization and soft actor-critic policies, our method not only significantly speeds up the inference, but also achieves much better performance in terms of the success rate for fulfilling the manipulation tasks. By conducting the ablation study, it is confirmed that the proposed RL with contrastive learning overcomes policy collapse. Supplementary demonstrations are available at https://cyberyyc.github.io/.
翻译:本文聚焦于灵巧机器人臂手系统的可扩展操作学习,通过增强现实(AR)技术建立远程人机交互以收集专家示范数据,从而提升学习效率。在此系统中,我们提出一个统一框架以解决通用操作任务问题。具体而言,所提方法包含两个阶段:i) 在预训练阶段,通过利用基于AR的远程人机交互系统产生的学习数据,以行为克隆(BC)方式构建初始策略;ii) 在第二阶段,开发了一种基于对比学习的强化学习(RL)方法,以获得比BC更高效、更鲁棒的策略,为此设计了投影头以加速学习进程。采用事件驱动的增强奖励机制以提升安全性。为验证所提方法,我们通过PyBullet进行了物理仿真并开展了真实世界实验。结果表明,相较于经典近端策略优化与柔性演员-评论家策略,我们的方法不仅显著加快了推理速度,在执行操作任务的成功率方面也表现出更优越的性能。通过消融实验证实,所提出的结合对比学习的强化学习方法有效避免了策略崩溃。补充演示材料详见 https://cyberyyc.github.io/。