Deep reinforcement learning (DRL) has shown remarkable performance on complex control problems in systems and networking, including adaptive video streaming, wireless resource management, and congestion control. For safe deployment, however, it is critical to reason about how agents behave across the range of system states they encounter in practice. Existing verification-based methods in this domain primarily focus on point properties, defined around fixed input states, which offer limited coverage and require substantial manual effort to identify relevant input-output pairs for analysis. In this paper, we study symbolic properties, that specify expected behavior over ranges of input states, for DRL agents in systems and networking. We present a generic formulation for symbolic properties, with monotonicity and robustness as concrete examples, and show how they can be analyzed using existing DNN verification engines. Our approach encodes symbolic properties as comparisons between related executions of the same policy and decomposes them into practically tractable sub-properties. These techniques serve as practical enablers for applying existing verification tools to symbolic analysis. Using our framework, diffRL, we conduct an extensive empirical study across three DRL-based control systems, adaptive video streaming, wireless resource management, and congestion control. Through these case studies, we analyze symbolic properties over broad input ranges, examine how property satisfaction evolves during training, study the impact of model size on verifiability, and compare multiple verification backends. Our results show that symbolic properties provide substantially broader coverage than point properties and can uncover non-obvious, operationally meaningful counterexamples, while also revealing practical solver trade-offs and limitations.
翻译:深度强化学习在系统和网络中的复杂控制问题(包括自适应视频流、无线资源管理和拥塞控制)上展现了卓越性能。然而,为确保安全部署,关键在于推理智能体在实际系统状态范围内的行为模式。该领域中现有基于验证的方法主要聚焦于点性质(point properties),即针对固定输入状态进行定义,这些方法覆盖范围有限,且需要大量人工工作来识别相关输入输出对进行分析。本文研究了系统和网络中DRL智能体的符号性质(symbolic properties),这类性质可在输入状态范围内描述期望行为。我们提出了符号性质的一般性形式化框架,以单调性和鲁棒性为具体实例,并展示了如何利用现有深度神经网络验证引擎对其进行分析。该方法将符号性质编码为同一策略相关执行之间的比较,并将其分解为实际可处理的子性质。这些技术作为关键使能手段,使得现有验证工具可应用于符号分析。基于我们的框架diffRL,我们在三类基于DRL的控制系统(自适应视频流、无线资源管理和拥塞控制)中开展了广泛实证研究。通过案例研究,我们分析了宽输入范围内的符号性质演化规律,探讨了模型规模对可验证性的影响,并比较了多个验证后端的性能差异。实验结果表明:符号性质相比点性质具有显著更广的覆盖率,能发现非直观且具有操作意义的反例,同时揭示了实际求解器的权衡与局限性。