We develop a queueing-theoretic framework to model the temporal evolution of cyber-attack surfaces, where the number of active vulnerabilities is represented as the backlog of a queue. Vulnerabilities arrive as they are discovered or created, and leave the system when they are patched or successfully exploited. Building on this model, we study how automation affects attack and defense dynamics by introducing an AI amplification factor that scales arrival, exploit, and patching rates. Our analysis shows that even symmetric automation can increase the rate of successful exploits. We validate the model using vulnerability data collected from an open source software supply chain and show that it closely matches real-world attack surface dynamics. Empirical results reveal heavy-tailed patching times, which we prove induce long-range dependence in vulnerability backlog and help explain persistent cyber risk. Utilizing our queueing abstraction for the attack surface, we develop a systematic approach for cyber risk mitigation. We formulate the dynamic defense problem as a constrained Markov decision process with resource-budget and switching-cost constraints, and develop a reinforcement learning (RL) algorithm that achieves provably near-optimal regret. Numerical experiments validate the approach and demonstrate that our adaptive RL-based defense policies significantly reduce successful exploits and mitigate heavy-tail queue events. Using trace-driven experiments on the ARVO dataset, we show that the proposed RL-based defense policy reduces the average number of active vulnerabilities in a software supply chain by over 90% compared to existing defense practices, without increasing the overall maintenance budget. Our results allow defenders to quantify cumulative exposure risk under long-range dependent attack dynamics and to design adaptive defense strategies with provable efficiency.
翻译:我们提出了一种基于排队论的框架,用于模拟网络攻击面的时间演化过程,其中活跃漏洞的数量被视为队列中的积压任务。漏洞在被发现或创建时到达系统,并在被修补或成功利用后离开系统。在此模型基础上,我们引入人工智能放大因子来量化自动化对攻防动态的影响——该因子可缩放漏洞到达率、利用率和修补率。分析表明,即使是对称的自动化措施也可能增加成功利用漏洞的速率。我们利用从开源软件供应链收集的漏洞数据验证模型,证明其能紧密拟合真实攻击面动态。实证结果显示修补时间呈现重尾分布,我们证明这会导致漏洞积压的长期依赖性,从而解释持续性网络风险的成因。基于攻击面的排队抽象,我们开发了一套系统化的网络风险缓解方法:将动态防御问题建模为包含资源预算与切换成本约束的约束马尔可夫决策过程,并设计了具有可证明近最优遗憾值的强化学习算法。数值实验验证了该方法有效性,表明基于强化学习的自适应防御策略能显著降低成功漏洞利用并缓解重尾队列事件。基于ARVO数据集的轨迹驱动实验显示,与现有防御实践相比,所提出的强化学习防御策略在维持总维护预算不变的情况下,将软件供应链中的平均活跃漏洞数量减少了90%以上。研究结果使防御者能够量化长期依赖攻击动态下的累积暴露风险,并设计具有可证明效率的自适应防御策略。