Sim-and-real training is a promising alternative to sim-to-real training for robot manipulations. However, the current sim-and-real training is neither efficient, i.e., slow convergence to the optimal policy, nor effective, i.e., sizeable real-world robot data. Given limited time and hardware budgets, the performance of sim-and-real training is not satisfactory. In this paper, we propose a Consensus-based Sim-And-Real deep reinforcement learning algorithm (CSAR) for manipulator pick-and-place tasks, which shows comparable performance in both sim-and-real worlds. In this algorithm, we train the agents in simulators and the real world to get the optimal policies for both sim-and-real worlds. We found two interesting phenomenons: (1) Best policy in simulation is not the best for sim-and-real training. (2) The more simulation agents, the better sim-and-real training. The experimental video is available at: https://youtu.be/mcHJtNIsTEQ.
翻译:模拟与真实联合训练是机器人操控中替代模拟到真实迁移训练的一种有前景的方案。然而,当前的模拟与真实联合训练既不够高效(即收敛到最优策略速度慢),也不够有效(即需要大量真实机器人数据)。在时间和硬件预算有限的情况下,模拟与真实联合训练的性能不尽人意。本文针对机械臂抓取放置任务,提出了一种基于共识的模拟与真实深度强化学习算法(CSAR),该算法在模拟环境与真实世界中均表现出同等水平的性能。在该算法中,我们同时在模拟器和真实世界中训练智能体,以获取适用于模拟与真实世界的最优策略。我们发现了两个有趣的现象:(1)模拟中的最优策略并非模拟与真实联合训练的最优策略;(2)参与训练的智能体数量越多,模拟与真实联合训练的效果越好。实验视频可于以下链接观看:https://youtu.be/mcHJtNIsTEQ。