Many state-of-the art robotic applications utilize series elastic actuators (SEAs) with closed-loop force control to achieve complex tasks such as walking, lifting, and manipulation. Model-free PID control methods are more prone to instability due to nonlinearities in the SEA where cascaded model-based robust controllers can remove these effects to achieve stable force control. However, these model-based methods require detailed investigations to characterize the system accurately. Deep reinforcement learning (DRL) has proved to be an effective model-free method for continuous control tasks, where few works deal with hardware learning. This paper describes the training process of a DRL policy on hardware of an SEA pendulum system for tracking force control trajectories from 0.05 - 0.35 Hz at 50 N amplitude using the Proximal Policy Optimization (PPO) algorithm. Safety mechanisms are developed and utilized for training the policy for 12 hours (overnight) without an operator present within the full 21 hours training period. The tracking performance is evaluated showing improvements of $25$ N in mean absolute error when comparing the first 18 min. of training to the full 21 hours for a 50 N amplitude, 0.1 Hz sinusoid desired force trajectory. Finally, the DRL policy exhibits better tracking and stability margins when compared to a model-free PID controller for a 50 N chirp force trajectory.
翻译:许多前沿机器人应用利用带有闭环力控制的串联弹性执行器(SEA)来执行行走、举升和操作等复杂任务。无模型PID控制方法因SEA的非线性特性而更易出现不稳定性,而基于级联模型的鲁棒控制器可消除这些影响以实现稳定的力控制。然而,此类基于模型的方法需要详尽的研究才能准确表征系统。深度强化学习(DRL)已被证明是连续控制任务中有效的无模型方法,但针对硬件学习的相关研究较少。本文描述了在SEA摆锤系统硬件上训练DRL策略的过程,通过近端策略优化(PPO)算法实现0.05–0.35 Hz频率、50 N幅值的力控制轨迹跟踪。研究开发并应用了安全机制,使得在21小时总训练周期内,策略可在无人值守情况下持续训练12小时(过夜)。跟踪性能评估表明,在50 N幅值、0.1 Hz正弦期望力轨迹下,将前18分钟训练与完整21小时训练结果对比,平均绝对误差降低了25 N。最后,与无模型PID控制器相比,DRL策略在50 N扫频力轨迹下展现出更优的跟踪性能和稳定裕度。