基于平滑神经代理的腿部模型预测控制学习 (Learning Legged MPC with Smooth Neural Surrogates)

Deep learning and model predictive control (MPC) can play complementary roles in legged robotics. However, integrating learned models with online planning remains challenging. When dynamics are learned with neural networks, three key difficulties arise: (1) stiff transitions from contact events may be inherited from the data; (2) additional non-physical local nonsmoothness can occur; and (3) training datasets can induce non-Gaussian model errors due to rapid state changes. We address (1) and (2) by introducing the smooth neural surrogate, a neural network with tunable smoothness designed to provide informative predictions and derivatives for trajectory optimization through contact. To address (3), we train these models using a heavy-tailed likelihood that better matches the empirical error distributions observed in legged-robot dynamics. Together, these design choices substantially improve the reliability, scalability, and generalizability of learned legged MPC. Across zero-shot locomotion tasks of increasing difficulty, smooth neural surrogates with robust learning yield consistent reductions in cumulative cost on simple, well-conditioned behaviors (typically 10-50%), while providing substantially larger gains in regimes where standard neural dynamics often fail outright. In these regimes, smoothing enables reliable execution (from 0/5 to 5/5 success) and produces about 2-50x lower cumulative cost, reflecting orders-of-magnitude absolute improvements in robustness rather than incremental performance gains.

翻译：深度学习与模型预测控制在腿部机器人领域具有互补作用。然而，将学习模型与在线规划相结合仍面临挑战。当使用神经网络学习动力学模型时，会出现三个关键难题：（1）接触事件产生的刚性过渡可能从数据中继承；（2）可能出现额外的非物理局部非光滑性；（3）训练数据集可能因状态快速变化而导致非高斯模型误差。我们通过引入平滑神经代理来解决（1）和（2），这是一种具有可调平滑性的神经网络，旨在为接触过程中的轨迹优化提供信息化的预测和导数。针对（3），我们采用重尾似然函数训练这些模型，以更好地匹配在腿部机器人动力学中观测到的经验误差分布。这些设计选择共同显著提升了学习型腿部模型预测控制的可靠性、可扩展性和泛化能力。在难度递增的零样本运动任务中，采用鲁棒学习的平滑神经代理在简单、良态行为上持续降低了累积成本（通常为10-50%），而在标准神经动力学模型通常完全失效的场景中则实现了更大幅度的性能提升。在这些场景中，平滑处理实现了可靠执行（成功率从0/5提升至5/5），并将累积成本降低约2-50倍，这反映了鲁棒性在绝对意义上的数量级改进，而非渐进式的性能提升。