We study automated intrusion response for an IT infrastructure and formulate the interaction between an attacker and a defender as a partially observed stochastic game. To solve the game we follow an approach where attack and defense strategies co-evolve through reinforcement learning and self-play toward an equilibrium. Solutions proposed in previous work prove the feasibility of this approach for small infrastructures but do not scale to realistic scenarios due to the exponential growth in computational complexity with the infrastructure size. We address this problem by introducing a method that recursively decomposes the game into subgames which can be solved in parallel. Applying optimal stopping theory we show that the best response strategies in these subgames exhibit threshold structures, which allows us to compute them efficiently. To solve the decomposed game we introduce an algorithm called Decompositional Fictitious Self-Play (DFSP), which learns Nash equilibria through stochastic approximation. We evaluate the learned strategies in an emulation environment where real intrusions and response actions can be executed. The results show that the learned strategies approximate an equilibrium and that DFSP significantly outperforms a state-of-the-art algorithm for a realistic infrastructure configuration.
翻译:针对IT基础设施的自动化入侵响应问题展开研究,将攻击者与防御者之间的交互建模为部分可观测随机博弈。为求解该博弈,我们采用攻防策略通过强化学习与自我对弈协同进化至均衡的方法。已有工作提出的解决方案虽能证明该方法在小规模基础设施中的可行性,但由于计算复杂度随基础设施规模呈指数增长,无法扩展至实际场景。为此,我们提出一种通过递归分解将原博弈拆解为可并行求解子博弈的方法。基于最优停止理论,我们证明这些子博弈中的最优响应策略具有阈值结构,从而能够高效计算。为求解分解后的博弈,我们引入名为分解式虚拟自我对弈(DFSP)的算法,通过随机逼近学习纳什均衡。在可执行真实入侵与响应动作的仿真环境中评估所学策略。结果表明:所学策略能够逼近均衡状态,且DFSP在真实基础设施配置下的性能显著优于现有最优算法。