We study a Stackelberg game between one attacker and one defender in a configurable environment. The defender picks a specific environment configuration. The attacker observes the configuration and attacks via Reinforcement Learning (RL trained against the observed environment). The defender's goal is to find the environment with minimum achievable reward for the attacker. We apply Evolutionary Diversity Optimization (EDO) to generate diverse population of environments for training. Environments with clearly high rewards are killed off and replaced by new offsprings to avoid wasting training time. Diversity not only improves training quality but also fits well with our RL scenario: RL agents tend to improve gradually, so a slightly worse environment earlier on may become better later. We demonstrate the effectiveness of our approach by focusing on a specific application, Active Directory (AD). AD is the default security management system for Windows domain networks. AD environment describes an attack graph, where nodes represent computers/accounts/etc., and edges represent accesses. The attacker aims to find the best attack path to reach the highest-privilege node. The defender can change the graph by removing a limited number of edges (revoke accesses). Our approach generates better defensive plans than the existing approach and scales better.
翻译:我们研究了一个在可配置环境中攻击者与防御者之间的斯塔克尔伯格博弈。防御者选择特定的环境配置,攻击者观察该配置后通过强化学习(针对所观察环境训练)发起攻击。防御者的目标是找到使攻击者可获得奖励最小的环境。我们采用进化多样性优化(EDO)生成多样化的环境种群进行训练。奖励明显较高的环境被淘汰,并由新的后代替换,以避免浪费训练时间。多样性不仅提升了训练质量,还与我们的强化学习场景高度契合:强化学习智能体通常逐步提升性能,因此早期稍差的环境在后期可能变得更好。我们聚焦于特定应用——活动目录(AD)来证明方法的有效性。AD是Windows域网络的默认安全管理系统。AD环境描述了一个攻击图,其中节点代表计算机/账户等,边代表访问权限。攻击者旨在找到最佳攻击路径以到达最高权限节点。防御者可通过移除有限数量的边(撤销访问权限)来改变图结构。我们的方法生成的防御方案优于现有方法,且扩展性更强。