Aerial manipulators enable physical interaction in hard-to-reach environments; however, the combined problem of direct whole-body aerial manipulation under rapid arm motion, payload changes, and related unknown dynamic uncertainty remains a largely unsolved problem. We present a hierarchical control framework that combines Reinforcement Learning (RL) with an inner-loop dynamics estimator to address this problem. The RL outer loop maps desired 6-degrees-of-freedom (DOF) end-effector targets to coordinated whole-body commands, enabling direct task-driven control without relying on a fully accurate coupled dynamic model in the policy layer. An inner loop then tracks these commands while compensating for transient inertial shifts and uncertainty during execution via a dynamics estimator scheme without requiring system model knowledge. We validate the proposed approach on a custom quadrotor equipped with a 3-DoF manipulator through hardware experiments under varying payload conditions. Compared with RL+PID and RL+INDI+PID baselines, the proposed method reduces end-effector tracking error and improves task success rate across the tested hardware conditions. These results show that combining learned whole-body coordination with estimator-based low-level compensation improves the precision and robustness of aerial manipulation under changing operating conditions.
翻译:空中操控器能够在对难以触及的环境中进行物理交互;然而,在快速臂运动、载荷变化及相关未知动力学不确定性下实现直接全身空中操控的组合问题仍然是大部分未解决的难题。我们提出了一种结合强化学习(RL)与内环动力学估计器的分层控制框架来解决此问题。RL外环将期望的6自由度末端执行器目标映射为协调的全身指令,从而在策略层无需依赖完全精确的耦合动力学模型即可实现直接任务驱动控制。随后,内环在执行过程中通过动力学估计器方案跟踪这些指令,同时补偿瞬态惯性偏移和不确定性,无需系统模型知识。我们通过在装有3自由度操控器的定制四旋翼飞行器上开展变载荷条件的硬件实验验证了所提方法。与RL+PID和RL+INDI+PID基线相比,所提方法在测试的硬件条件下降低了末端执行器跟踪误差并提高了任务成功率。这些结果表明,将学习型全身协调与基于估计器的低层补偿相结合,可提升空中操控在变化运行条件下的精确性和鲁棒性。