AdaDoS: Adaptive DoS Attack via Deep Adversarial Reinforcement Learning in SDN

Existing defence mechanisms have demonstrated significant effectiveness in mitigating rule-based Denial-of-Service (DoS) attacks, leveraging predefined signatures and static heuristics to identify and block malicious traffic. However, the emergence of AI-driven techniques presents new challenges to SDN security, potentially compromising the efficacy of existing defence mechanisms. In this paper, we introduce~AdaDoS, an adaptive attack model that disrupt network operations while evading detection by existing DoS-based detectors through adversarial reinforcement learning (RL). Specifically, AdaDoS models the problem as a competitive game between an attacker, whose goal is to obstruct network traffic without being detected, and a detector, which aims to identify malicious traffic. AdaDoS can solve this game by dynamically adjusting its attack strategy based on feedback from the SDN and the detector. Additionally, recognising that attackers typically have less information than defenders, AdaDoS formulates the DoS-like attack as a partially observed Markov decision process (POMDP), with the attacker having access only to delay information between attacker and victim nodes. We address this challenge with a novel reciprocal learning module, where the student agent, with limited observations, enhances its performance by learning from the teacher agent, who has full observational capabilities in the SDN environment. AdaDoS represents the first application of RL to develop DoS-like attack sequences, capable of adaptively evading both machine learning-based and rule-based DoS-like attack detectors.

翻译：现有防御机制通过利用预定义签名和静态启发式方法来识别和阻断恶意流量，在缓解基于规则的拒绝服务（DoS）攻击方面已展现出显著成效。然而，人工智能驱动技术的兴起给软件定义网络（SDN）安全带来了新的挑战，可能危及现有防御机制的有效性。本文提出AdaDoS，一种自适应攻击模型，它通过对抗强化学习（RL）在破坏网络运行的同时规避现有基于DoS的检测器的侦测。具体而言，AdaDoS将问题建模为攻击者与检测器之间的竞争博弈：攻击者的目标是阻塞网络流量而不被检测，检测器则旨在识别恶意流量。AdaDoS能够根据来自SDN和检测器的反馈动态调整其攻击策略，从而求解该博弈问题。此外，考虑到攻击者通常比防御者掌握更少的信息，AdaDoS将类DoS攻击建模为部分可观测马尔可夫决策过程（POMDP），其中攻击者仅能获取攻击节点与受害节点之间的延迟信息。我们通过一种新颖的互逆学习模块应对这一挑战：在该模块中，观测能力有限的学生智能体通过向在SDN环境中具备完全观测能力的教师智能体学习，从而提升其攻击性能。AdaDoS是强化学习在生成类DoS攻击序列中的首次应用，能够自适应地规避基于机器学习的和基于规则的类DoS攻击检测器。