Reinforcement learning (RL) approaches based on Markov Decision Processes (MDPs) are predominantly applied in the robot joint space, often relying on limited task-specific information and partial awareness of the 3D environment. In contrast, episodic RL has demonstrated advantages over traditional MDP-based methods in terms of trajectory consistency, task awareness, and overall performance in complex robotic tasks. Moreover, traditional step-wise and episodic RL methods often neglect the contact-rich information inherent in task-space manipulation, especially considering the contact-safety and robustness. In this work, contact-rich manipulation tasks are tackled using a task-space, energy-safe framework, where reliable and safe task-space trajectories are generated through the combination of Proximal Policy Optimization (PPO) and movement primitives. Furthermore, an energy-aware Cartesian Impedance Controller objective is incorporated within the proposed framework to ensure safe interactions between the robot and the environment. Our experimental results demonstrate that the proposed framework outperforms existing methods in handling tasks on various types of surfaces in 3D environments, achieving high success rates as well as smooth trajectories and energy-safe interactions.
翻译:基于马尔可夫决策过程(MDP)的强化学习(RL)方法主要应用于机器人关节空间,通常依赖有限的任务特定信息和对三维环境的局部感知。相比之下,片段式强化学习在复杂机器人任务中,相较于传统的基于MDP的方法,在轨迹一致性、任务感知和整体性能方面展现出优势。此外,传统的步进式和片段式强化学习方法往往忽略了任务空间操作中固有的丰富接触信息,特别是接触安全性与鲁棒性方面的考量。本研究采用任务空间能量安全框架处理接触丰富的操作任务,通过结合近端策略优化(PPO)与运动基元生成可靠且安全的任务空间轨迹。此外,所提框架中融入了能量感知的笛卡尔阻抗控制器目标,以确保机器人与环境之间的安全交互。实验结果表明,所提框架在处理三维环境中各类表面上的任务时优于现有方法,实现了高成功率、平滑轨迹以及能量安全的交互。