Humans exhibit remarkable motor agility, enabling a wide range of dynamic skills such as running and jumping, which highlights the great potential of humanoid robots for athletic locomotion. Among athletic sports, long rope skipping requires two rope turners to cooperatively swing the rope while adapting to a player under different jumping rhythms, making it a meaningful yet challenging task for humanoid robots. Although existing methods for humanoid sports have achieved success in single-agent and interaction-free settings, such as running, dancing, and parkour, task scenarios that require precise coordination among multiple participants remain largely unexplored. To this end, we propose Marope, a multi-agent reinforcement learning (MARL) framework for cooperative long rope skipping with multiple humanoid robots. Specifically, Marope adopts a hierarchical reinforcement learning framework for policy training. At the lower level, it learns decentralized rope manipulation policies through MARL, while at the upper level, a centralized scheduling policy is trained to coordinate the execution of the lower-level policies. To improve generalization across different player behavioral styles, Marope further incorporates diverse jumping policies into cooperative game training. We evaluate our approach on Unitree G1 humanoid robots in both simulation and real-world settings. Experimental results demonstrate that Marope outperforms various baselines, achieving more efficient and stable rope manipulation as well as more robust and adaptable cooperation with varied players.
翻译:人类展现出卓越的运动敏捷性,能够执行跑步、跳跃等多种动态技能,这突显了人形机器人在运动竞技中的巨大潜力。在体育运动中,长绳跳绳需要两名摇绳者协作挥动绳索,同时适应不同跳跃节奏的参与者,这使得其成为人形机器人一项重要且具有挑战性的任务。尽管现有的人形机器人运动方法已在单智能体及无交互场景(如跑步、舞蹈及跑酷)中取得成功,但需要多名参与者精确协调的任务场景仍鲜有探索。为此,我们提出Marope——一种面向多个人形机器人协作长绳跳绳的多智能体强化学习(MARL)框架。具体而言,Marope采用分层强化学习框架进行策略训练:底层通过多智能体强化学习训练去中心化的绳索操控策略;上层则训练集中式调度策略以协调底层策略的执行。为提升对不同参与者行为风格的泛化能力,Marope进一步将多样化的跳跃策略融入合作博弈训练。我们在仿真与现实环境中基于宇树G1人形机器人进行评估,实验结果表明Marope优于多种基线方法,实现了更高效稳定的绳索操控,以及与不同参与者间更鲁棒、更具适应性的协作。