Robust Adversarial Attacks Detection based on Explainable Deep Reinforcement Learning For UAV Guidance and Planning

The dangers of adversarial attacks on Uncrewed Aerial Vehicle (UAV) agents operating in public are increasing. Adopting AI-based techniques and, more specifically, Deep Learning (DL) approaches to control and guide these UAVs can be beneficial in terms of performance but can add concerns regarding the safety of those techniques and their vulnerability against adversarial attacks. Confusion in the agent's decision-making process caused by these attacks can seriously affect the safety of the UAV. This paper proposes an innovative approach based on the explainability of DL methods to build an efficient detector that will protect these DL schemes and the UAVs adopting them from attacks. The agent adopts a Deep Reinforcement Learning (DRL) scheme for guidance and planning. The agent is trained with a Deep Deterministic Policy Gradient (DDPG) with Prioritised Experience Replay (PER) DRL scheme that utilises Artificial Potential Field (APF) to improve training times and obstacle avoidance performance. A simulated environment for UAV explainable DRL-based planning and guidance, including obstacles and adversarial attacks, is built. The adversarial attacks are generated by the Basic Iterative Method (BIM) algorithm and reduced obstacle course completion rates from 97\% to 35\%. Two adversarial attack detectors are proposed to counter this reduction. The first one is a Convolutional Neural Network Adversarial Detector (CNN-AD), which achieves accuracy in the detection of 80\%. The second detector utilises a Long Short Term Memory (LSTM) network. It achieves an accuracy of 91\% with faster computing times compared to the CNN-AD, allowing for real-time adversarial detection.

翻译：无人机（UAV）在公共环境中运行时，所面临的对抗攻击威胁日益加剧。采用基于人工智能的技术，特别是深度学习方法控制与导引无人机，虽然可在性能上带来优势，但也引发了对这些技术安全性及其抗对抗攻击脆弱性的担忧。攻击导致的智能体决策过程混乱会严重危及无人机安全。本文提出了一种基于深度学习方法可解释性的创新方案，构建高效检测器以保护深度学习模型及其应用的无人机免受攻击。智能体采用深度强化学习（DRL）框架进行导引与规划。该智能体通过结合人工势场法（APF）的深度确定性策略梯度与优先经验回放（DDPG-PER）方案训练，以提升训练效率与避障性能。我们构建了基于可解释深度强化学习的无人机规划与导引仿真环境，其中包含障碍物与对抗攻击。对抗攻击采用基本迭代法（BIM）算法生成，导致障碍物穿越完成率从97%骤降至35%。为应对该性能下降，本文提出两种对抗攻击检测器：第一种是基于卷积神经网络的对抗检测器（CNN-AD），检测准确率达80%；第二种采用长短期记忆（LSTM）网络，其检测准确率提升至91%，且计算速度更快，可实现实时对抗检测。