Soccer presents a significant challenge for humanoid robots, demanding tightly integrated perception-action capabilities for tasks like perception-guided kicking and whole-body balance control. Existing approaches suffer from inter-module instability in modular pipelines or conflicting training objectives in end-to-end frameworks. We propose Perception-Action integrated Decision-making (PAiD), a progressive architecture that decomposes soccer skill acquisition into three stages: motion-skill acquisition via human motion tracking, lightweight perception-action integration for positional generalization, and physics-aware sim-to-real transfer. This staged decomposition establishes stable foundational skills, avoids reward conflicts during perception integration, and minimizes sim-to-real gaps. Experiments on the Unitree G1 demonstrate high-fidelity human-like kicking with robust performance under diverse conditions-including static or rolling balls, various positions, and disturbances-while maintaining consistent execution across indoor and outdoor scenarios. Our divide-and-conquer strategy advances robust humanoid soccer capabilities and offers a scalable framework for complex embodied skill acquisition. The project page is available at https://soccer-humanoid.github.io/.
翻译:足球运动对人形机器人提出了重大挑战,需要高度整合的感知-动作能力以完成感知引导踢球和全身平衡控制等任务。现有方法在模块化流程中存在模块间不稳定的问题,或在端到端框架中存在训练目标冲突。我们提出感知-动作集成决策(PAiD),这是一种渐进式架构,将足球技能习得分解为三个阶段:通过人体运动追踪实现动作技能习得,通过轻量级感知-动作集成实现位置泛化,以及通过物理感知的仿真到现实迁移。这种分阶段分解建立了稳定的基础技能,避免了感知集成过程中的奖励冲突,并最小化了仿真到现实的差距。在Unitree G1机器人上的实验展示了高保真度的类人踢球能力,在不同条件下(包括静态或滚动足球、多样位置及干扰)均表现出鲁棒性能,同时在室内外场景中保持执行一致性。我们的分治策略推进了鲁棒人形足球能力的发展,并为复杂具身技能习得提供了可扩展框架。项目页面详见 https://soccer-humanoid.github.io/。