Learning effective continuous control policies in high-dimensional systems, including musculoskeletal agents, remains a significant challenge. Over the course of biological evolution, organisms have developed robust mechanisms for overcoming this complexity to learn highly sophisticated strategies for motor control. What accounts for this robust behavioral flexibility? Modular control via muscle synergies, i.e. coordinated muscle co-contractions, is considered to be one putative mechanism that enables organisms to learn muscle control in a simplified and generalizable action space. Drawing inspiration from this evolved motor control strategy, we use physiologically accurate human hand and leg models as a testbed for determining the extent to which a Synergistic Action Representation (SAR) acquired from simpler tasks facilitates learning more complex tasks. We find in both cases that SAR-exploiting policies significantly outperform end-to-end reinforcement learning. Policies trained with SAR were able to achieve robust locomotion on a wide set of terrains with high sample efficiency, while baseline approaches failed to learn meaningful behaviors. Additionally, policies trained with SAR on a multiobject manipulation task significantly outperformed (>70% success) baseline approaches (<20% success). Both of these SAR-exploiting policies were also found to generalize zero-shot to out-of-domain environmental conditions, while policies that did not adopt SAR failed to generalize. Finally, we establish the generality of SAR on broader high-dimensional control problems using a robotic manipulation task set and a full-body humanoid locomotion task. To the best of our knowledge, this investigation is the first of its kind to present an end-to-end pipeline for discovering synergies and using this representation to learn high-dimensional continuous control across a wide diversity of tasks.
翻译:在包括肌肉骨骼智能体在内的高维系统中学习有效的连续控制策略仍是一项重大挑战。在生物进化过程中,有机体发展出克服这一复杂性的稳健机制,从而习得高度精密的运动控制策略。什么因素造就了这种稳健的行为灵活性?通过肌肉协同(即协调的肌肉共收缩)实现的模块化控制被认为是一种可能的机制,它使有机体能够在简化和可泛化的动作空间中学习肌肉控制。受这一进化运动控制策略的启发,我们采用生理精确的人手和腿部模型作为实验平台,探究从简单任务中获取的协同动作表征(SAR)能在多大程度上促进更复杂任务的学习。我们在两种情况下均发现,利用SAR的策略显著优于端到端强化学习。采用SAR训练的策略能够以高样本效率在多种地形上实现稳健的运动控制,而基线方法则无法学习有意义的运动行为。此外,在多物体操控任务中,采用SAR训练的策略表现显著优于基线方法(成功率>70%,基线<20%)。这两种采用SAR的策略还被发现能够零样本泛化到领域外的环境条件中,而未采用SAR的策略则无法泛化。最后,我们通过机器人操控任务集和全身类人运动任务,验证了SAR在更广泛的高维控制问题中的通用性。据我们所知,本研究首次提出了一套端到端流水线,用于发现协同模式并利用该表征学习跨多种多样任务的高维连续控制。