A significant challenge for autonomous cyber defence is ensuring a defensive agent's ability to generalise across diverse network topologies and configurations. This capability is necessary for agents to remain effective when deployed in dynamically changing environments, such as an enterprise network where devices may frequently join and leave. Standard approaches to deep reinforcement learning, where policies are parameterised using a fixed-input multi-layer perceptron (MLP) expect fixed-size observation and action spaces. In autonomous cyber defence, this makes it hard to develop agents that generalise to environments with network topologies different from those trained on, as the number of nodes affects the natural size of the observation and action spaces. To overcome this limitation, we reframe the problem of autonomous network defence using entity-based reinforcement learning, where the observation and action space of an agent are decomposed into a collection of discrete entities. This framework enables the use of policy parameterisations specialised in compositional generalisation. We train a Transformer-based policy on the Yawning Titan cyber-security simulation environment and test its generalisation capabilities across various network topologies. We demonstrate that this approach significantly outperforms an MLP-based policy when training across fixed-size networks of varying topologies, and matches performance when training on a single network. We also demonstrate the potential for zero-shot generalisation to networks of a different size to those seen in training. These findings highlight the potential for entity-based reinforcement learning to advance the field of autonomous cyber defence by providing more generalisable policies capable of handling variations in real-world network environments.
翻译:自主网络防御面临的一个重大挑战是确保防御智能体能够泛化到不同的网络拓扑和配置。这种能力对于智能体在动态变化环境(如设备频繁加入和退出的企业网络)中保持有效性至关重要。标准的深度强化学习方法使用固定输入的多层感知机(MLP)参数化策略,要求固定大小的观测空间和动作空间。在自主网络防御中,这导致难以开发能够泛化到与训练网络拓扑不同环境的智能体,因为节点数量会影响观测空间和动作空间的自然维度。为克服这一限制,我们采用基于实体的强化学习重新构建自主网络防御问题,将智能体的观测空间和动作空间分解为离散实体的集合。该框架支持使用专门针对组合泛化的策略参数化方法。我们在Yawning Titan网络安全仿真环境中训练基于Transformer的策略,并测试其在各种网络拓扑中的泛化能力。实验表明,在不同拓扑的固定规模网络中进行训练时,该方法显著优于基于MLP的策略;在单一网络训练时则表现相当。我们还展示了该方法对训练中未见规模网络进行零样本泛化的潜力。这些发现凸显了基于实体的强化学习通过提供更具泛化能力的策略来处理现实网络环境变化的潜力,有望推动自主网络防御领域的发展。