Learning effective continuous control policies in high-dimensional systems, including musculoskeletal agents, remains a significant challenge. Over the course of biological evolution, organisms have developed robust mechanisms for overcoming this complexity to learn highly sophisticated strategies for motor control. What accounts for this robust behavioral flexibility? Modular control via muscle synergies, i.e. coordinated muscle co-contractions, is considered to be one putative mechanism that enables organisms to learn muscle control in a simplified and generalizable action space. Drawing inspiration from this evolved motor control strategy, we use physiologically accurate human hand and leg models as a testbed for determining the extent to which a Synergistic Action Representation (SAR) acquired from simpler tasks facilitates learning more complex tasks. We find in both cases that SAR-exploiting policies significantly outperform end-to-end reinforcement learning. Policies trained with SAR were able to achieve robust locomotion on a wide set of terrains with high sample efficiency, while baseline approaches failed to learn meaningful behaviors. Additionally, policies trained with SAR on a multiobject manipulation task significantly outperformed (>70% success) baseline approaches (<20% success). Both of these SAR-exploiting policies were also found to generalize zero-shot to out-of-domain environmental conditions, while policies that did not adopt SAR failed to generalize. Finally, we establish the generality of SAR on broader high-dimensional control problems using a robotic manipulation task set and a full-body humanoid locomotion task. To the best of our knowledge, this investigation is the first of its kind to present an end-to-end pipeline for discovering synergies and using this representation to learn high-dimensional continuous control across a wide diversity of tasks.
翻译:在高维系统(包括肌肉骨骼智能体)中学习有效的连续控制策略仍是一项重大挑战。在生物进化过程中,生物体发展出克服这种复杂性的稳健机制,从而习得高度精密的运动控制策略。是什么造就了这种稳健的行为灵活性?通过肌肉协同(即协调的肌肉共收缩)实现模块化控制被认为是使生物体能够在简化且可泛化的动作空间中学习肌肉控制的一种潜在机制。受这种演化而来的运动控制策略的启发,我们使用生理上精确的人类手部和腿部模型作为测试平台,以确定从简单任务中习得的协同动作表征(SAR)在多大程度上有助于学习更复杂的任务。我们发现,在这两种情况下,利用SAR的策略显著优于端到端强化学习。使用SAR训练的策略能够在多种地形上以高样本效率实现稳健的运动,而基线方法则无法学习有意义的动作。此外,在多物体操作任务中,使用SAR训练的策略表现显著优于基线方法(成功率>70%对<20%)。这两种利用SAR的策略还被发现能够零样本泛化到域外环境条件,而未采用SAR的策略则无法泛化。最后,我们通过机器人操作任务集和全身人形机器人运动任务建立了SAR在更广泛高维控制问题中的通用性。据我们所知,本研究首次提出了一种端到端的流程,用于发现协同作用并利用这种表征跨广泛多样的任务学习高维连续控制。