Soft robots are becoming extremely popular thanks to their intrinsic safety to contacts and adaptability. However, the potentially infinite number of Degrees of Freedom makes their modeling a daunting task, and in many cases only an approximated description is available. This challenge makes reinforcement learning (RL) based approaches inefficient when deployed on a realistic scenario, due to the large domain gap between models and the real platform. In this work, we demonstrate, for the first time, how Domain Randomization (DR) can solve this problem by enhancing RL policies with: i) a higher robustness w.r.t. environmental changes; ii) a higher affordability of learned policies when the target model differs significantly from the training model; iii) a higher effectiveness of the policy, which can even autonomously learn to exploit the environment to increase the robot capabilities (environmental constraints exploitation). Moreover, we introduce a novel algorithmic extension of previous adaptive domain randomization methods for the automatic inference of dynamics parameters for deformable objects. We provide results on four different tasks and two soft robot designs, opening interesting perspectives for future research on Reinforcement Learning for closed-loop soft robot control.
翻译:软体机器人因其与生俱来的接触安全性和适应性而日益普及。然而,其潜在无限的自由度使其建模成为一项艰巨任务,且在许多情况下仅能获得近似描述。这一挑战使得基于强化学习的方法在真实场景中部署时效率低下,原因在于模型与真实平台之间存在巨大的域差距。本研究首次证明,域随机化可通过以下方式解决该问题:i) 增强策略对环境变化的鲁棒性;ii) 当目标模型与训练模型存在显著差异时,提高所学策略的经济性;iii) 提升策略的有效性,使其能够自主学习利用环境来增强机器人能力(即环境约束利用)。此外,我们针对可变形物体动态参数的自适应推断,提出了一种新颖的自适应域随机化算法扩展。我们在四个不同任务和两种软体机器人设计上验证了结果,为未来软体机器人闭环控制的强化学习研究开辟了有趣的前景。