Recently, reinforcement and deep reinforcement learning (RL/DRL) have been applied to develop autonomous agents for cyber network operations(CyOps), where the agents are trained in a representative environment using RL and particularly DRL algorithms. The training environment must simulate CyOps with high fidelity, which the agent aims to learn and accomplish. A good simulator is hard to achieve due to the extreme complexity of the cyber environment. The trained agent must also be generalizable to network variations because operational cyber networks change constantly. The red agent case is taken to discuss these two issues in this work. We elaborate on their essential requirements and potential solution options, illustrated by some preliminary experimentations in a Cyber Gym for Intelligent Learning (CyGIL) testbed.
翻译:近年来,强化学习和深度强化学习(RL/DRL)已被应用于开发用于网络空间操作(CyOps)的自主智能体,这些智能体在代表性环境中通过RL(尤其是DRL算法)进行训练。训练环境必须高保真地模拟网络空间操作,以便智能体能够学习并完成任务。由于网络环境的极端复杂性,难以实现优秀的模拟器。同时,由于运营网络不断变化,训练后的智能体还必须具备对网络变化的泛化能力。本文以红方智能体为例,探讨了这两个问题。我们阐述了其基本要求和潜在解决方案选项,并通过智能学习网络靶场(CyGIL)测试平台中的一些初步实验加以说明。