ACDZero：基于图嵌入的树搜索实现自动化网络防御 (ACDZero: Graph-Embedding-Based Tree Search for Mastering Automated Cyber Defense)

Automated cyber defense (ACD) seeks to protect computer networks with minimal or no human intervention, reacting to intrusions by taking corrective actions such as isolating hosts, resetting services, deploying decoys, or updating access controls. However, existing approaches for ACD, such as deep reinforcement learning (RL), often face difficult exploration in complex networks with large decision/state spaces and thus require an expensive amount of samples. Inspired by the need to learn sample-efficient defense policies, we frame ACD in CAGE Challenge 4 (CAGE-4 / CC4) as a context-based partially observable Markov decision problem and propose a planning-centric defense policy based on Monte Carlo Tree Search (MCTS). It explicitly models the exploration-exploitation tradeoff in ACD and uses statistical sampling to guide exploration and decision making. We make novel use of graph neural networks (GNNs) to embed observations from the network as attributed graphs, to enable permutation-invariant reasoning over hosts and their relationships. To make our solution practical in complex search spaces, we guide MCTS with learned graph embeddings and priors over graph-edit actions, combining model-free generalization and policy distillation with look-ahead planning. We evaluate the resulting agent on CC4 scenarios involving diverse network structures and adversary behaviors, and show that our search-guided, graph-embedding-based planning improves defense reward and robustness relative to state-of-the-art RL baselines.

翻译：自动化网络防御（ACD）旨在以最少或无需人工干预的方式保护计算机网络，通过采取纠正措施（如隔离主机、重置服务、部署诱饵或更新访问控制）来应对入侵。然而，现有ACD方法（如深度强化学习）在决策/状态空间庞大的复杂网络中常面临探索困难，因而需要大量昂贵的样本。受高效样本学习防御策略需求的启发，我们将CAGE挑战赛第四轮（CAGE-4/CC4）中的ACD问题建模为基于上下文的部分可观测马尔可夫决策过程，并提出一种以规划为核心的防御策略，其基于蒙特卡洛树搜索（MCTS）。该方法显式建模ACD中的探索-利用权衡，并利用统计抽样指导探索与决策。我们创新性地运用图神经网络（GNN）将网络观测数据嵌入为属性图，从而实现对主机及其关系的置换不变推理。为使该方案在复杂搜索空间中具备实用性，我们利用习得的图嵌入和图编辑动作先验指导MCTS，将无模型泛化、策略蒸馏与前向规划相结合。我们在涉及多样化网络结构和对抗行为的CC4场景中评估所构建的智能体，结果表明：相较于最先进的强化学习基线方法，这种基于图嵌入的搜索引导规划能显著提升防御收益与鲁棒性。