Humanoid table tennis (TT) demands rapid perception, proactive whole-body motion, and agile footwork under strict timing--capabilities that remain difficult for end-to-end control policies. We propose a reinforcement learning (RL) framework that maps ball-position observations directly to whole-body joint commands for both arm striking and leg locomotion, strengthened by predictive signals and dense, physics-guided rewards. A lightweight learned predictor, fed with recent ball positions, estimates future ball states and augments the policy's observations for proactive decision-making. During training, a physics-based predictor supplies precise future states to construct dense, informative rewards that lead to effective exploration. The resulting policy attains strong performance across varied serve ranges (hit rate$\geq$96% and success rate$\geq$92%) in simulations. Ablation studies confirm that both the learned predictor and the predictive reward design are critical for end-to-end learning. Deployed zero-shot on a physical Booster T1 humanoid with 23 revolute joints, the policy produces coordinated lateral and forward-backward footwork with accurate, fast returns, suggesting a practical path toward versatile, competitive humanoid TT. We have open-sourced our RL training code at: https://github.com/purdue-tracelab/TTRL-ICRA2026
翻译:人形乒乓球要求严格的时序控制下具备快速感知、主动全身运动及敏捷步法——这些能力对于端到端控制策略而言仍然困难重重。我们提出一种强化学习框架,该框架将球的位置观测直接映射到用于手臂击球和腿部运动的全身关节指令,并通过预测信号和密集的物理引导奖励加以强化。一种轻量级学习预测器利用最近的球位置信息估计未来球状态,并增强策略的观测能力以实现主动决策。在训练过程中,基于物理的预测器提供精确的未来状态,用于构建密集且有信息量的奖励,从而引导有效的探索。最终获得的策略在模拟环境中对不同发球范围均表现出强劲性能(命中率≥96%,成功率≥92%)。消融实验证实,学习预测器和预测性奖励设计对端到端学习至关重要。该策略在具有23个旋转关节的实体Booster T1人形机器人上实现零样本部署,能产生协调的横向与前后步法,并实现准确、快速的回球,为迈向全能且具竞争力的人形乒乓球指明了可行路径。我们已在以下网址开源强化学习训练代码:https://github.com/purdue-tracelab/TTRL-ICRA2026