Beyond Runtime Enforcement: Shield Synthesis as Defensibility Analysis for Adversarial Networks

Shielded reinforcement learning is typically presented as a runtime safety mechanism that compiles temporal-logic specifications into automata restricting an agent's actions. We argue this is the wrong product. The same automata-theoretic machinery -- specification compilation, product game construction, attractor computation, and winning-region extraction -- is better read as a design-time analytical instrument whose outputs are structural insights about a system rather than runtime constraints on a deployed agent. We instantiate this through a constrained two-player safety game for network defense. The two specifications are enforced asymmetrically: the defender specification defines the unsafe region of the game, whereas the attacker specification restricts the adversary's legal actions during attractor computation. Solving the game yields a defensibility verdict -- a formal certificate that a topology-specification pair is or is not defensible -- with the associated winning region and shield. Beyond the binary verdict, we derive topology-level metrics from the attractor structure and combine them with post-convergence behavior from shield-constrained adversarial multi-agent reinforcement learning. Together these form a defensibility fingerprint capturing both a network's formal safety properties and its operational behavior under adaptive play. A what-if analysis shows that formal defensibility and operational effectiveness capture distinct aspects of security: small architectural changes can produce large shifts in operational outcomes while leaving formal safety margins nearly unchanged. Shield synthesis is thus most valuable not as a deployment mechanism for safe agents, but as a framework for answering architectural questions about whether, where, and how a system can be defended. The defensibility verdict is the output, not the safe policy.

翻译：屏蔽强化学习通常被描述为一种运行时安全机制，它将时序逻辑规约编译成约束智能体行为的自动机。我们主张这并非其正确应用。相同的自动机理论机制——规约编译、乘积博弈构建、吸引子计算以及获胜区域提取——更适合被解读为一种设计时分析工具，其输出是关于系统的结构性洞见，而非对已部署智能体的运行时约束。我们通过一个用于网络防御的约束双人安全博弈来实例化这一观点。两个规约被非对称地强制执行：防御者规约定义了博弈的不安全区域，而攻击者规约在吸引子计算期间约束了对手的合法行为。求解该博弈可得到一个可防御性判定——一个关于“拓扑-规约对”是否可防御的形式化证书——以及关联的获胜区域和屏蔽。超越二值判定，我们从吸引子结构推导出拓扑级别指标，并将其与来自屏蔽约束的对抗性多智能体强化学习的收敛后行为相结合。这些共同构成一个可防御性指纹，同时捕获了网络的形式化安全属性及其在自适应博弈下的运行行为。一个假设分析表明，形式化可防御性与运行有效性捕获了安全的不同方面：小的架构变更可能引起运行结果的巨大变化，而几乎不改变形式化的安全裕度。因此，屏蔽综合最有价值之处，不在于作为安全智能体的部署机制，而在于作为一个框架，用于回答关于系统是否、在何处以及如何被防御的架构性问题。可防御性判定是其输出，而非安全策略本身。