Humans excel at robust bipedal walking in complex natural environments. In each step, they adequately tune the interaction of biomechanical muscle dynamics and neuronal signals to be robust against uncertainties in ground conditions. However, it is still not fully understood how the nervous system resolves the musculoskeletal redundancy to solve the multi-objective control problem considering stability, robustness, and energy efficiency. In computer simulations, energy minimization has been shown to be a successful optimization target, reproducing natural walking with trajectory optimization or reflex-based control methods. However, these methods focus on particular motions at a time and the resulting controllers are limited when compensating for perturbations. In robotics, reinforcement learning~(RL) methods recently achieved highly stable (and efficient) locomotion on quadruped systems, but the generation of human-like walking with bipedal biomechanical models has required extensive use of expert data sets. This strong reliance on demonstrations often results in brittle policies and limits the application to new behaviors, especially considering the potential variety of movements for high-dimensional musculoskeletal models in 3D. Achieving natural locomotion with RL without sacrificing its incredible robustness might pave the way for a novel approach to studying human walking in complex natural environments.
翻译:人类在复杂自然环境中擅长稳健的双足行走。每一步中,他们都能充分调节生物力学肌肉动力学与神经元信号的相互作用,以应对地面条件的不确定性。然而,神经系统如何解决肌肉骨骼的冗余性,从而在考虑稳定性、稳健性和能效的前提下解决多目标控制问题,仍未被完全理解。在计算机模拟中,能量最小化已被证明是一个成功的优化目标,通过轨迹优化或基于反射的控制方法可再现自然行走。然而,这些方法一次仅关注特定动作,且由此产生的控制器在补偿扰动方面能力有限。在机器人学中,强化学习方法近期已在四足系统上实现了高度稳定(且高效)的运动,但利用双足生物力学模型生成类人行走仍需大量使用专家数据集。这种对示范的强依赖常导致策略脆弱,并限制了其在新行为中的应用——尤其考虑到高维三维肌肉骨骼模型可能产生的多种运动。在不牺牲其卓越稳健性的前提下,通过强化学习实现自然步态,或将为研究复杂自然环境中的人类行走开辟新途径。