Reinforcement learning method is extremely competitive in gait generation techniques for quadrupedal robot, which is mainly due to the fact that stochastic exploration in reinforcement training is beneficial to achieve an autonomous gait. Nevertheless, although incremental reinforcement learning is employed to improve training success and movement smoothness by relying on the continuity inherent during limb movements, challenges remain in adapting gait policy to diverse terrain and external disturbance. Inspired by the association between reinforcement learning and the evolution of animal motion behavior, a self-improvement mechanism for reference gait is introduced in this paper to enable incremental learning of action and self-improvement of reference action together to imitate the evolution of animal motion behavior. Further, a new framework for reinforcement training of quadruped gait is proposed. In this framework, genetic algorithm is specifically adopted to perform global probabilistic search for the initial value of the arbitrary foot trajectory to update the reference trajectory with better fitness. Subsequently, the improved reference gait is used for incremental reinforcement learning of gait. The above process is repeatedly and alternatively executed to finally train the gait policy. The analysis considering terrain, model dimensions, and locomotion condition is presented in detail based on simulation, and the results show that the framework is significantly more adaptive to terrain compared to regular incremental reinforcement learning.
翻译:强化学习方法在四足机器人步态生成技术中极具竞争力,这主要得益于强化训练中的随机探索有利于实现自主步态。然而,尽管增量强化学习可借助肢体运动固有的连续性来提升训练成功率与运动平滑度,但在使步态策略适应多样化地形与外部干扰方面仍存在挑战。受强化学习与动物运动行为演化之间关联性的启发,本文引入了一种参考步态的自改进机制,使动作的增量学习与参考动作的自我改进协同进行,以模拟动物运动行为的演化过程。进一步地,本文提出了一种新的四足步态强化训练框架。该框架专门采用遗传算法对任意足端轨迹的初始值进行全局概率搜索,以更新具有更佳适应度的参考轨迹。随后,将改进后的参考步态用于步态的增量强化学习。通过反复交替执行上述过程,最终训练出步态策略。基于仿真实验,本文详细分析了地形、模型尺寸与运动条件的影响,结果表明相较于常规增量强化学习,该框架对地形的适应能力显著提升。