We study automated security response for an IT infrastructure and formulate the interaction between an attacker and a defender as a partially observed, non-stationary game. We relax the standard assumption that the game model is correctly specified and consider that each player has a probabilistic conjecture about the model, which may be misspecified in the sense that the true model has probability 0. This formulation allows us to capture uncertainty about the infrastructure and the intents of the players. To learn effective game strategies online, we design a novel method where a player iteratively adapts its conjecture using Bayesian learning and updates its strategy through rollout. We prove that the conjectures converge to best fits, and we provide a bound on the performance improvement that rollout enables with a conjectured model. To characterize the steady state of the game, we propose a variant of the Berk-Nash equilibrium. We present our method through an advanced persistent threat use case. Simulation studies based on testbed measurements show that our method produces effective security strategies that adapt to a changing environment. We also find that our method enables faster convergence than current reinforcement learning techniques.
翻译:我们研究面向IT基础设施的自动化安全响应,并将攻击者与防御者之间的交互建模为部分可观测的非平稳博弈。我们放宽了博弈模型需正确设定的标准假设,考虑每个参与者对模型持有概率性猜想,该猜想可能被错误指定(即真实模型概率为零)。这一建模方式使我们能够刻画基础设施的不确定性以及参与者的意图。为了在线学习有效的博弈策略,我们设计了一种新颖方法:参与者通过贝叶斯学习迭代调整其猜想,并通过滚动时域控制更新策略。我们证明猜想将收敛至最优拟合,并给出了基于猜想模型进行滚动控制所能实现的性能改进上界。为表征博弈的稳态,我们提出了Berk-Nash均衡的变体。通过高级持续性威胁用例展示了该方法。基于测试平台测量的仿真研究表明,我们的方法能产生适应动态环境的有效安全策略,且与传统强化学习方法相比具有更快的收敛速度。