Reinforcement learning (RL) has emerged as a promising paradigm in complex and continuous robotic tasks, however, safe exploration has been one of the main challenges, especially in contact-rich manipulation tasks in unstructured environments. Focusing on this issue, we propose SRL-VIC: a model-free safe RL framework combined with a variable impedance controller (VIC). Specifically, safety critic and recovery policy networks are pre-trained where safety critic evaluates the safety of the next action using a risk value before it is executed and the recovery policy suggests a corrective action if the risk value is high. Furthermore, the policies are updated online where the task policy not only achieves the task but also modulates the stiffness parameters to keep a safe and compliant profile. A set of experiments in contact-rich maze tasks demonstrate that our framework outperforms the baselines (without the recovery mechanism and without the VIC), yielding a good trade-off between efficient task accomplishment and safety guarantee. We show our policy trained on simulation can be deployed on a physical robot without fine-tuning, achieving successful task completion with robustness and generalization. The video is available at https://youtu.be/ksWXR3vByoQ.
翻译:强化学习(RL)已成为处理复杂连续机器人任务的一种有前景的范式,然而,安全探索一直是其主要挑战之一,尤其是在非结构化环境中的接触丰富操作任务中。针对这一问题,我们提出了SRL-VIC:一种结合了可变阻抗控制器(VIC)的无模型安全RL框架。具体而言,我们预训练了安全评判器和恢复策略网络,其中安全评判器在执行下一个动作前使用风险值评估其安全性,而恢复策略则在风险值较高时建议纠正动作。此外,策略在线更新,其中任务策略不仅完成任务,还调节刚度参数以保持安全且柔顺的特性。在一系列接触丰富的迷宫任务实验中,我们的框架优于基线方法(无恢复机制和无VIC),在高效完成任务与安全保证之间取得了良好平衡。我们展示了在仿真中训练的策略无需微调即可部署到物理机器人上,并以鲁棒性和泛化能力成功完成任务。相关视频可在 https://youtu.be/ksWXR3vByoQ 查看。