Reinforcement learning (RL) has emerged as a powerful method to learn robust control policies for bipedal locomotion. Yet, it can be difficult to tune desired robot behaviors due to unintuitive and complex reward design. In comparison, trajectory optimization-based methods offer more tuneable, interpretable, and mathematically grounded motion plans for high-dimensional legged systems. However, these methods often remain brittle to real-world disturbances like external perturbations. In this work, we present NaviGait, a hierarchical framework that combines the structure of trajectory optimization with the adaptability of RL for robust and intuitive locomotion control. NaviGait leverages RL to synthesize new motions by selecting, minimally morphing, and stabilizing gaits taken from an offline-generated gait library. NaviGait results in walking policies that match the reference motion well while maintaining robustness comparable to other locomotion controllers. Additionally, the structure imposed by NaviGait drastically simplifies the RL reward composition. Our experimental results demonstrate that NaviGait enables faster training compared to conventional and imitation-based RL, and produces motions that remain closest to the original reference. Overall, by decoupling high-level motion generation from low-level correction, NaviGait offers a more scalable and generalizable approach for achieving dynamic and robust locomotion. Videos and the full framework are publicly available at https://dynamicmobility.github.io/navigait/
翻译:强化学习已成为学习双足运动鲁棒控制策略的强大方法。然而,由于奖励函数设计非直观且复杂,往往难以调整期望的机器人行为。相比之下,基于轨迹优化的方法能为高维腿式系统提供更具可调性、可解释性且数学基础扎实的运动规划。然而,这类方法在面对外部扰动等现实干扰时通常仍显脆弱。本文提出NaviGait——一种将轨迹优化的结构性与强化学习的适应性相结合的分层框架,用于实现鲁棒且直观的运动控制。NaviGait利用强化学习合成新运动,其机制包括从离线生成的步态库中选择步态、进行最小形变调整并实现步态稳定。该方法生成的行走策略既能良好匹配参考运动,又能保持与其他运动控制器相当的鲁棒性。此外,NaviGait所强化的结构显著简化了强化学习奖励函数的设计。实验结果表明,相较于传统强化学习与基于模仿的强化学习方法,NaviGait能实现更快速的训练,并生成最接近原始参考运动的动作。总体而言,通过将高层运动生成与底层修正解耦,NaviGait为实现动态鲁棒运动控制提供了更具可扩展性和泛化性的途径。演示视频及完整框架已公开于 https://dynamicmobility.github.io/navigait/