Achieving human-level competitive intelligence and physical agility in humanoid robots remains a major challenge, particularly in contact-rich and highly dynamic tasks such as boxing. While Multi-Agent Reinforcement Learning (MARL) offers a principled framework for strategic interaction, its direct application to humanoid control is hindered by high-dimensional contact dynamics and the absence of strong physical motion priors. We propose RoboStriker, a hierarchical three-stage framework that enables fully autonomous humanoid boxing by decoupling high-level strategic reasoning from low-level physical execution. The framework first learns a comprehensive repertoire of boxing skills by training a single-agent motion tracker on human motion capture data. These skills are subsequently distilled into a structured latent manifold, regularized by projecting the Gaussian-parameterized distribution onto a unit hypersphere. This topological constraint effectively confines exploration to the subspace of physically plausible motions. In the final stage, we introduce Latent-Space Neural Fictitious Self-Play (LS-NFSP), where competing agents learn competitive tactics by interacting within the latent action space rather than the raw motor space, significantly stabilizing multi-agent training. Experimental results demonstrate that RoboStriker achieves superior competitive performance in simulation and exhibits sim-to-real transfer. Our website is available at RoboStriker.
翻译:在仿人机器人中实现人类水平的竞争智能与身体敏捷性仍然是一个重大挑战,尤其是在拳击这类接触密集且高度动态的任务中。虽然多智能体强化学习为策略性交互提供了一个原则性框架,但其直接应用于仿人机器人控制受到高维接触动力学和缺乏强物理运动先验的阻碍。我们提出了RoboStriker,这是一个三层级框架,通过将高层策略推理与底层物理执行解耦,实现了完全自主的仿人机器人拳击。该框架首先通过在人类运动捕捉数据上训练单智能体运动跟踪器,学习一套全面的拳击技能。这些技能随后被提炼成一个结构化的潜在流形,并通过将高斯参数化分布投影到单位超球面上进行正则化。这种拓扑约束有效地将探索限制在物理上合理的运动子空间内。在最后阶段,我们引入了潜在空间神经虚拟自我博弈,其中竞争智能体通过在潜在动作空间而非原始电机空间内交互来学习竞争策略,从而显著稳定了多智能体训练。实验结果表明,RoboStriker在仿真中实现了卓越的竞争性能,并展现了从仿真到现实的迁移能力。我们的网站可通过RoboStriker访问。