JuggleRL: Mastering Ball Juggling with a Quadrotor via Deep Reinforcement Learning

Aerial robots interacting with objects must perform precise, contact-rich maneuvers under uncertainty. In this paper, we study the problem of aerial ball juggling using a quadrotor equipped with a racket, a task that demands accurate timing, stable control, and continuous adaptation. We propose JuggleRL, the first reinforcement learning-based system for aerial juggling. It learns closed-loop policies in large-scale simulation using systematic calibration of quadrotor and ball dynamics to reduce the sim-to-real gap. The training incorporates reward shaping to encourage racket-centered hits and sustained juggling, as well as domain randomization over ball position and coefficient of restitution to enhance robustness and transferability. The learned policy outputs mid-level commands executed by a low-level controller and is deployed zero-shot on real hardware, where an enhanced perception module with a lightweight communication protocol reduces delays in high-frequency state estimation and ensures real-time control. Experiments show that JuggleRL achieves an average of $311$ hits over $10$ consecutive trials in the real world, with a maximum of $462$ hits observed, far exceeding a model-based baseline that reaches at most $14$ hits with an average of $3.1$. Moreover, the policy generalizes to unseen conditions, successfully juggling a lighter $5$ g ball with an average of $145.9$ hits. This work demonstrates that reinforcement learning can empower aerial robots with robust and stable control in dynamic interaction tasks.

翻译：空中机器人与物体交互时，必须在不确定性下执行精确且接触密集的机动动作。本文研究利用搭载球拍的四旋翼无人机实现空中杂耍球体的问题，该任务要求精确的时机把握、稳定的控制能力以及持续的适应性调整。我们提出了JuggleRL，首个基于强化学习的空中杂耍系统。该系统通过对四旋翼无人机与球体动力学进行系统标定以缩小仿真到现实的差距，从而在大规模仿真中学习闭环控制策略。训练过程采用奖励塑形技术以鼓励以球拍为中心的击球和持续杂耍动作，并通过球体位置与恢复系数的域随机化来增强鲁棒性和可迁移性。学习得到的策略输出由底层控制器执行的中层指令，并以零样本方式部署在真实硬件上，其中采用轻量通信协议的增强感知模块减少了高频状态估计的延迟，确保了实时控制能力。实验表明，JuggleRL在现实世界中连续10次试验平均达到$311$次击球，最高观测到$462$次击球，远超基于模型的基线方法（最高$14$次击球，平均$3.1$次）。此外，该策略能够泛化至未见条件，成功杂耍重量仅$5$克的轻质球体，平均达到$145.9$次击球。本研究表明，强化学习能够赋予空中机器人在动态交互任务中实现鲁棒且稳定的控制能力。