The ongoing rise in cyberattacks and the lack of skilled professionals in the cybersecurity domain to combat these attacks show the need for automated tools capable of detecting an attack with good performance. Attackers disguise their actions and launch attacks that consist of multiple actions, which are difficult to detect. Therefore, improving defensive tools requires their calibration against a well-trained attacker. In this work, we propose a model of an attacking agent and environment and evaluate its performance using basic Q-Learning, Naive Q-learning, and DoubleQ-Learning, all of which are variants of Q-Learning. The attacking agent is trained with the goal of exfiltrating data whereby all the hosts in the network have a non-zero detection probability. Results show that the DoubleQ-Learning agent has the best overall performance rate by successfully achieving the goal in $70\%$ of the interactions.
翻译:摘要:网络攻击持续增长,而网络安全领域缺乏能够有效应对这些攻击的专业人才,这表明需要开发具有良好检测性能的自动化工具。攻击者会伪装其行为并发动由多个动作组成的攻击,这些攻击难以被检测。因此,提升防御工具需要通过与训练有素的攻击者进行校准来实现。在本工作中,我们提出了一个攻击智能体及其环境的模型,并使用基本Q-Learning、朴素Q-Learning和DoubleQ-Learning(均为Q-Learning的变体)评估其性能。攻击智能体的训练目标是窃取数据,且网络中所有主机均具有非零检测概率。结果表明,DoubleQ-Learning智能体在$70\%$的交互中成功达成目标,具有最优的整体性能表现。