Muscle-actuated organisms are capable of learning an unparalleled diversity of dexterous movements despite their vast amount of muscles. Reinforcement learning (RL) on large musculoskeletal models, however, has not been able to show similar performance. We conjecture that ineffective exploration in large overactuated action spaces is a key problem. This is supported by the finding that common exploration noise strategies are inadequate in synthetic examples of overactuated systems. We identify differential extrinsic plasticity (DEP), a method from the domain of self-organization, as being able to induce state-space covering exploration within seconds of interaction. By integrating DEP into RL, we achieve fast learning of reaching and locomotion in musculoskeletal systems, outperforming current approaches in all considered tasks in sample efficiency and robustness.
翻译:肌肉驱动的生物体能够学习种类繁多的灵巧运动,尽管它们拥有大量肌肉。然而,针对大型肌肉骨骼模型的强化学习尚未能展现出类似的性能。我们推测,在大型过度驱动动作空间中进行无效探索是其中的关键问题。这一观点得到了以下发现的支持:在过驱系统的合成示例中,常见的探索噪声策略并不充分。我们发现,源自自组织领域的差分外在可塑性方法能够在数秒交互内引发覆盖状态空间的探索。通过将DEP融入强化学习,我们实现了肌肉骨骼系统中触及与运动行为的快速学习,在样本效率和鲁棒性方面均优于所有考虑任务中的现有方法。