Learning fast and robust ball-kicking skills is a critical capability for humanoid soccer robots, yet it remains a challenging problem due to the need for rapid leg swings, postural stability on a single support foot, and robustness under noisy sensory input and external perturbations (e.g., opponents). This paper presents a reinforcement learning (RL)-based system that enables humanoid robots to execute robust continual ball-kicking with adaptability to different ball-goal configurations. The system extends a typical teacher-student training framework -- in which a "teacher" policy is trained with ground truth state information and the "student" learns to mimic it with noisy, imperfect sensing -- by including four training stages: (1) long-distance ball chasing (teacher); (2) directional kicking (teacher); (3) teacher policy distillation (student); and (4) student adaptation and refinement (student). Key design elements -- including tailored reward functions, realistic noise modeling, and online constrained RL for adaptation and refinement -- are critical for closing the sim-to-real gap and sustaining performance under perceptual uncertainty. Extensive evaluations in both simulation and on a real robot demonstrate strong kicking accuracy and goal-scoring success across diverse ball-goal configurations. Ablation studies further highlight the necessity of the constrained RL, noise modeling, and the adaptation stage. This work presents a system for learning robust continual humanoid ball-kicking under imperfect perception, establishing a benchmark task for visuomotor skill learning in humanoid whole-body control.
翻译:学习快速且稳健的踢球技能是人形足球机器人的关键能力,但由于需要快速的腿部摆动、单支撑脚上的姿态稳定性,以及在噪声感知输入和外部干扰(如对手)下的鲁棒性,这仍然是一个具有挑战性的问题。本文提出了一种基于强化学习(RL)的系统,使人形机器人能够执行稳健的连续踢球,并适应不同的球门配置。该系统扩展了典型的师生训练框架——其中“教师”策略使用真实状态信息进行训练,“学生”则学习在噪声和不完美感知下模仿它——通过包含四个训练阶段:(1)长距离追球(教师);(2)定向踢球(教师);(3)教师策略蒸馏(学生);(4)学生适应与精炼(学生)。关键设计要素——包括定制的奖励函数、现实的噪声建模以及用于适应与精炼的在线约束强化学习——对于缩小仿真到现实的差距并在感知不确定性下保持性能至关重要。在仿真和真实机器人上的广泛评估表明,该系统在不同球门配置下具有出色的踢球准确性和进球成功率。消融研究进一步强调了约束强化学习、噪声建模和适应阶段的必要性。这项工作提出了一种在不完美感知下学习稳健连续人形踢球的系统,为人形全身控制中的视觉运动技能学习建立了基准任务。