We study automated intrusion response for an IT infrastructure and formulate the interaction between an attacker and a defender as a partially observed stochastic game. To solve the game we follow an approach where attack and defense strategies co-evolve through reinforcement learning and self-play toward an equilibrium. Solutions proposed in previous work prove the feasibility of this approach for small infrastructures but do not scale to realistic scenarios due to the exponential growth in computational complexity with the infrastructure size. We address this problem by introducing a method that recursively decomposes the game into subgames which can be solved in parallel. Applying optimal stopping theory we show that the best response strategies in these subgames exhibit threshold structures, which allows us to compute them efficiently. To solve the decomposed game we introduce an algorithm called Decompositional Fictitious Self-Play (DFSP), which learns Nash equilibria through stochastic approximation. We evaluate the learned strategies in an emulation environment where real intrusions and response actions can be executed. The results show that the learned strategies approximate an equilibrium and that DFSP significantly outperforms a state-of-the-art algorithm for a realistic infrastructure configuration.
翻译:我们研究了面向IT基础设施的自动化入侵响应问题,并将攻击者与防御者之间的交互建模为部分可观测随机博弈。为求解该博弈,我们采用了一种使攻击与防御策略通过强化学习与自我对弈协同演化至均衡的方法。已有工作提出的方案证明了该方法在小规模基础设施中的可行性,但由于计算复杂度随基础设施规模呈指数增长,这些方案无法扩展至现实场景。我们通过提出一种将博弈递归分解为可并行求解子博弈的方法来应对这一挑战。应用最优停止理论,我们证明了这些子博弈中的最优响应策略具有阈值结构,从而能够高效计算。为求解分解后的博弈,我们提出了分解性虚构自博弈算法(DFSP),该算法通过随机逼近学习纳什均衡。我们在可执行真实入侵与响应动作的仿真环境中评估了学习得到的策略。结果表明,学习策略能够逼近均衡,并且DFSP在现实基础设施配置下显著优于当前最先进算法。