Drones equipped with overhead manipulators offer unique capabilities for inspection, maintenance, and contact-based interaction. However, the motion of the drone and its manipulator is tightly linked, and even small attitude changes caused by wind or control imperfections shift the end-effector away from its intended path. This coupling makes reliable tracking difficult and also limits the direct use of learning-based arm controllers that were originally designed for fixed-base robots. These effects appear consistently in our tests whenever the UAV body experiences drift or rapid attitude corrections. To address this behavior, we develop a reinforcement-learning (RL) framework with a transformer-based double deep Q learning (DDQN), with the core idea of using an adaptive beam-search planner that applies a short-horizon beam search over candidate control sequences using the learned critic as the forward estimator. This allows the controller to anticipate the end-effector's motion through simulated rollouts rather than executing those actions directly on the actual model, realizing a software-in-the-loop (SITL) approach. The lookahead relies on value estimates from a Transformer critic that processes short sequences of states, while a DDQN backbone provides the one-step targets needed to keep the learning process stable. Evaluated on a 3-DoF aerial manipulator under identical training conditions, the proposed meta-adaptive planner shows the strongest overall performance with a 10.2% reward increase, a substantial reduction in mean tracking error (from about 6% to 3%), and a 29.6% improvement in the combined reward-error metric relative to the DDQN baseline. Our method exhibits elevated stability in tracking target tip trajectory (by maintaining 5 cm tracking error) when the drone base exhibits drifts due to external disturbances, as opposed to the fixed-beam and Transformer-only variants.
翻译:配备过顶机械臂的无人机为巡检、维护和接触式交互提供了独特能力。然而,无人机与其机械臂的运动紧密耦合,即使由风扰或控制不完善引起的小幅姿态变化,也会使末端执行器偏离预期轨迹。这种耦合不仅使可靠跟踪变得困难,也限制了原本为固定基座机器人设计的基于学习的机械臂控制器的直接应用。在我们的测试中,每当无人机机体出现漂移或快速姿态修正时,这些效应都会持续显现。为解决这一问题,我们开发了一个基于Transformer双深度Q学习(DDQN)的强化学习框架,其核心思想是采用自适应波束搜索规划器,该规划器利用学习得到的评判器作为前向估计器,在候选控制序列上执行短视界波束搜索。这使得控制器能够通过模拟展开预判末端执行器的运动,而非直接在真实模型上执行这些动作,从而实现了软件在环(SITL)方法。该前瞻机制依赖于处理短状态序列的Transformer评判器所提供的价值估计,而DDQN主干则提供保持学习过程稳定所需的单步目标。在相同训练条件下对三自由度空中机械臂进行评估,所提出的元自适应规划器展现了最强的整体性能:奖励值提升10.2%,平均跟踪误差显著降低(从约6%降至3%),且相较于DDQN基线,在奖励-误差综合指标上改善29.6%。当无人机基座因外部扰动发生漂移时,与固定波束和纯Transformer变体相比,我们的方法在跟踪目标末端轨迹方面展现出更高的稳定性(保持5厘米跟踪误差)。