Most successes in robotic manipulation have been restricted to single-arm gripper robots, whose low dexterity limits the range of solvable tasks to pick-and-place, inser-tion, and object rearrangement. More complex tasks such as assembly require dual and multi-arm platforms, but entail a suite of unique challenges such as bi-arm coordination and collision avoidance, robust grasping, and long-horizon planning. In this work we investigate the feasibility of training deep reinforcement learning (RL) policies in simulation and transferring them to the real world (Sim2Real) as a generic methodology for obtaining performant controllers for real-world bi-manual robotic manipulation tasks. As a testbed for bi-manual manipulation, we develop the U-Shape Magnetic BlockAssembly Task, wherein two robots with parallel grippers must connect 3 magnetic blocks to form a U-shape. Without manually-designed controller nor human demonstrations, we demonstrate that with careful Sim2Real considerations, our policies trained with RL in simulation enable two xArm6 robots to solve the U-shape assembly task with a success rate of above90% in simulation, and 50% on real hardware without any additional real-world fine-tuning. Through careful ablations,we highlight how each component of the system is critical for such simple and successful policy learning and transfer,including task specification, learning algorithm, direct joint-space control, behavior constraints, perception and actuation noises, action delays and action interpolation. Our results present a significant step forward for bi-arm capability on real hardware, and we hope our system can inspire future research on deep RL and Sim2Real transfer of bi-manualpolicies, drastically scaling up the capability of real-world robot manipulators.
翻译:在机器人操作领域,大多数成功案例局限于单臂夹爪机器人,其低灵巧性限制了可解决任务的范围,仅能完成拾取-放置、插入和物体重排等操作。更复杂的装配任务需要双臂及多臂平台,但面临双臂协调与碰撞规避、稳健抓取、长时域规划等一系列独特挑战。本研究探索了在仿真环境中训练深度强化学习(RL)策略,并将其迁移至真实世界(Sim2Real)的可行性,将其作为获取真实世界双臂机器人操控任务高性能控制器的通用方法论。以U型磁性模块装配任务为双臂操控测试平台,两台配备平行夹爪的机器人需连接三块磁性模块构成U型结构。在无需手工设计控制器或人类示教的情况下,通过精心的Sim2Real设计,基于仿真环境RL训练的策略使两台xArm6机器人在仿真中实现超90%的成功率,在真实硬件上无需额外微调即达到50%的成功率。通过细致的消融实验,我们揭示了系统各组件对实现简洁高效策略学习与迁移的关键作用,包括任务规范、学习算法、直接关节空间控制、行为约束、感知与执行噪声、动作延迟及动作插值。本成果显著推进了真实硬件双臂操控能力的发展,期望本系统能启发深度RL及双臂策略Sim2Real迁移的未来研究,大幅提升真实机器人操作器的能力边界。