Deep Reinforcement Learning (DRL) has achieved remarkable success, ranging from complex computer games to real-world applications, showing the potential for intelligent agents capable of learning in dynamic environments. However, its application in real-world scenarios presents challenges, including the jerky problem, in which jerky trajectories not only compromise system safety but also increase power consumption and shorten the service life of robotic and autonomous systems. To address jerky actions, a method called conditioning for action policy smoothness (CAPS) was proposed by adding regularization terms to reduce the action changes. This paper further proposes a novel method, named Gradient-based CAPS (Grad-CAPS), that modifies CAPS by reducing the difference in the gradient of action and then uses displacement normalization to enable the agent to adapt to invariant action scales. Consequently, our method effectively reduces zigzagging action sequences while enhancing policy expressiveness and the adaptability of our method across diverse scenarios and environments. In the experiments, we integrated Grad-CAPS with different reinforcement learning algorithms and evaluated its performance on various robotic-related tasks in DeepMind Control Suite and OpenAI Gym environments. The results demonstrate that Grad-CAPS effectively improves performance while maintaining a comparable level of smoothness compared to CAPS and Vanilla agents.
翻译:深度强化学习(DRL)已在从复杂计算机游戏到现实世界应用的多个领域取得显著成功,显示出智能体在动态环境中学习的潜力。然而,其在现实场景中的应用仍面临挑战,包括动作抖动问题——这种不平稳的轨迹不仅会损害系统安全性,还会增加功耗并缩短机器人及自主系统的使用寿命。为解决动作抖动问题,先前研究提出了通过添加正则化项以减少动作变化的动作策略平滑性调节方法(CAPS)。本文进一步提出一种名为基于梯度的CAPS(Grad-CAPS)的新方法,该方法通过减小动作梯度差异来改进CAPS,并利用位移归一化使智能体适应不变的动作尺度。因此,我们的方法能有效减少锯齿状动作序列,同时增强策略表达能力及其在不同场景与环境中的适应性。实验中,我们将Grad-CAPS与多种强化学习算法结合,并在DeepMind Control Suite和OpenAI Gym环境中的各类机器人相关任务上评估其性能。结果表明,与CAPS及原始智能体相比,Grad-CAPS在保持相当平滑度的同时有效提升了性能。