Robust Adversarial Attacks Detection based on Explainable Deep Reinforcement Learning For UAV Guidance and Planning

The dangers of adversarial attacks on Uncrewed Aerial Vehicle (UAV) agents operating in public are increasing. Adopting AI-based techniques and, more specifically, Deep Learning (DL) approaches to control and guide these UAVs can be beneficial in terms of performance but can add concerns regarding the safety of those techniques and their vulnerability against adversarial attacks. Confusion in the agent's decision-making process caused by these attacks can seriously affect the safety of the UAV. This paper proposes an innovative approach based on the explainability of DL methods to build an efficient detector that will protect these DL schemes and the UAVs adopting them from attacks. The agent adopts a Deep Reinforcement Learning (DRL) scheme for guidance and planning. The agent is trained with a Deep Deterministic Policy Gradient (DDPG) with Prioritised Experience Replay (PER) DRL scheme that utilises Artificial Potential Field (APF) to improve training times and obstacle avoidance performance. A simulated environment for UAV explainable DRL-based planning and guidance, including obstacles and adversarial attacks, is built. The adversarial attacks are generated by the Basic Iterative Method (BIM) algorithm and reduced obstacle course completion rates from 97\% to 35\%. Two adversarial attack detectors are proposed to counter this reduction. The first one is a Convolutional Neural Network Adversarial Detector (CNN-AD), which achieves accuracy in the detection of 80\%. The second detector utilises a Long Short Term Memory (LSTM) network. It achieves an accuracy of 91\% with faster computing times compared to the CNN-AD, allowing for real-time adversarial detection.

翻译：无人机在公共环境中运行时面临日益增长的对抗攻击风险。采用基于人工智能的技术，特别是深度学习方法来控制和引导这些无人机，虽然能提升性能，但也引发了这些技术的安全性及其对抗攻击脆弱性的担忧。攻击导致智能体决策过程混乱，可能严重影响无人机安全。本文提出一种创新方法，基于深度学习方法的可解释性构建高效检测器，以保护深度学习方案及采用这些方案的无人机免受攻击。智能体采用深度强化学习方案进行导航与规划。该智能体使用基于优先经验回放的深度确定性策略梯度深度强化学习方案进行训练，并利用人工势场改进训练时间和避障性能。本文构建了包含障碍物和对抗攻击的无人机可解释深度强化学习规划与导航仿真环境。对抗攻击采用基本迭代法算法生成，使障碍物航线完成率从97%降至35%。为应对该性能下降，本文提出两种对抗攻击检测器。第一种是卷积神经网络对抗检测器，检测准确率达80%；第二种采用长短期记忆网络，检测准确率达91%，且计算速度比卷积神经网络对抗检测器更快，可实现实时对抗检测。