Soft robots are gaining popularity thanks to their intrinsic safety to contacts and adaptability. However, the potentially infinite number of Degrees of Freedom makes their modeling a daunting task, and in many cases only an approximated description is available. This challenge makes reinforcement learning (RL) based approaches inefficient when deployed on a realistic scenario, due to the large domain gap between models and the real platform. In this work, we demonstrate, for the first time, how Domain Randomization (DR) can solve this problem by enhancing RL policies for soft robots with: i) robustness w.r.t. unknown dynamics parameters; ii) reduced training times by exploiting drastically simpler dynamic models for learning; iii) better environment exploration, which can lead to exploitation of environmental constraints for optimal performance. Moreover, we introduce a novel algorithmic extension to previous adaptive domain randomization methods for the automatic inference of dynamics parameters for deformable objects. We provide an extensive evaluation in simulation on four different tasks and two soft robot designs, opening interesting perspectives for future research on Reinforcement Learning for closed-loop soft robot control.
翻译:软体机器人因与人接触时的内在安全性和适应性而日益受到青睐。然而,其潜在无限的自由度使得建模极为困难,多数情况下仅能得到近似描述。这一挑战导致基于强化学习的方法在真实场景中部署时效率低下,原因为模型与真实平台之间存在巨大的领域鸿沟。本研究首次论证域随机化如何通过以下方式增强软体机器人的强化学习策略以解决该问题:i) 对未知动力学参数的鲁棒性;ii) 利用大幅简化的动力学模型进行学习以缩短训练时间;iii) 增强环境探索能力,进而利用环境约束实现最优性能。此外,我们针对可变形物体的动力学参数自动推断问题,提出了一个对先前自适应域随机化方法的新型算法扩展。我们在四种不同任务和两种软体机器人设计上进行了充分的仿真评估,为未来软体机器人闭环控制的强化学习研究开辟了有趣的研究前景。