We formulate operating-system vulnerability discovery as a \emph{repeated Bayesian Stackelberg search game} in which a Large Reasoning Model (LRM) orchestrator allocates analysis budget across kernel files, functions, and attack paths while external verifiers -- static analyzers, fuzzers, and sanitizers -- provide evidence. At each round, the orchestrator selects a target component, an analysis method, and a time budget; observes tool outputs; updates Bayesian beliefs over latent vulnerability states; and re-solves the game to minimize the strategic attacker's expected payoff. We introduce \textsc{VCAO} (\textbf{V}erifier-\textbf{C}entered \textbf{A}gentic \textbf{O}rchestration), a six-layer architecture comprising surface mapping, intra-kernel attack-graph construction, game-theoretic file/function ranking, parallel executor agents, cascaded verification, and a safety governor. Our DOBSS-derived MILP allocates budget optimally across heterogeneous analysis tools under resource constraints, with formal $\tilde{O}(\sqrt{T})$ regret bounds from online Stackelberg learning. Experiments on five Linux kernel subsystems -- replaying 847 historical CVEs and running live discovery on upstream snapshots -- show that \textsc{VCAO} discovers $2.7\times$ more validated vulnerabilities per unit budget than coverage-only fuzzing, $1.9\times$ more than static-analysis-only baselines, and $1.4\times$ more than non-game-theoretic multi-agent pipelines, while reducing false-positive rates reaching human reviewers by 68\%. We release our simulation framework, synthetic attack-graph generator, and evaluation harness as open-source artifacts.
翻译:我们将操作系统漏洞发现形式化为一个重复贝叶斯Stackelberg搜索博弈,其中大型推理模型(LRM)编排器在各类内核文件、函数及攻击路径间分配分析预算,而外部验证器(静态分析工具、模糊测试器和污点分析器)则提供证据。每轮博弈中,编排器选定目标组件、分析方法与时间预算;观察工具输出;更新关于潜伏漏洞状态的贝叶斯信念;并重新求解博弈,以最小化策略性攻击者的预期收益。我们提出了VCAO(以验证器为中心的智能编排系统),该架构包含六个层次:表面映射、内核内部攻击图构建、基于博弈论的文件/函数排序、并行执行智能体、级联验证以及安全监管器。基于DOBSS导出的混合整数线性规划(MILP)能够在资源约束下,跨异构分析工具实现预算的最优分配,并附带来自在线Stackelberg学习的形式化$\tilde{O}(\sqrt{T})$遗憾边界。在五个Linux内核子系统上(复现847个历史CVE漏洞,并在上游快照上执行实时漏洞发现)的实验表明:相比仅覆盖度导向的模糊测试,VCAO在单位预算内发现的已验证漏洞数量是其2.7倍;相比仅静态分析的基线方法,是其1.9倍;相比非博弈论的多智能体流水线,是其1.4倍;同时,将流经人类审核员审查的误报率降低了68%。我们已开源发布仿真框架、合成攻击图生成器及评估测试工具。