In this paper, we investigate the operation of an aerial manipulator system, namely an Unmanned Aerial Vehicle (UAV) equipped with a controllable arm with two degrees of freedom to carry out actuation tasks on the fly. Our solution is based on employing a Q-learning method to control the trajectory of the tip of the arm, also called end-effector. More specifically, we develop a motion planning model based on Time To Collision (TTC), which enables a quadrotor UAV to navigate around obstacles while ensuring the manipulator's reachability. Additionally, we utilize a model-based Q-learning model to independently track and control the desired trajectory of the manipulator's end-effector, given an arbitrary baseline trajectory for the UAV platform. Such a combination enables a variety of actuation tasks such as high-altitude welding, structural monitoring and repair, battery replacement, gutter cleaning, skyscrapper cleaning, and power line maintenance in hard-to-reach and risky environments while retaining compatibility with flight control firmware. Our RL-based control mechanism results in a robust control strategy that can handle uncertainties in the motion of the UAV, offering promising performance. Specifically, our method achieves 92% accuracy in terms of average displacement error (i.e. the mean distance between the target and obtained trajectory points) using Q-learning with 15,000 episodes
翻译:本文研究了一种空中机械臂系统,即配备两自由度可控机械臂的无人机在执行飞行任务时的操作。我们的解决方案基于Q学习方法控制机械臂末端执行器的轨迹。具体而言,我们开发了一种基于碰撞时间的运动规划模型,使四旋翼无人机能够在确保机械臂可达性的同时绕开障碍物。此外,我们利用基于模型的Q-learning模型,在无人机平台任意基线轨迹条件下独立跟踪并控制机械臂末端执行器的期望轨迹。这种组合使得在难以到达的危险环境中实现多种操作任务成为可能,例如高空焊接、结构监测与修复、电池更换、排水沟清洁、摩天大楼清洗及电力线路维护,同时保持与飞行控制固件的兼容性。我们的强化学习控制机制形成了一种鲁棒的控制策略,能够处理无人机运动中的不确定性,展现出优异的性能。具体而言,在15000次训练回合下,采用Q学习方法的平均位移误差(即目标轨迹点与获得轨迹点之间的平均距离)准确率达到92%。