Diffusion Model-based Reinforcement Learning for Version Age of Information Scheduling: Average and Tail-Risk-Sensitive Control

Ensuring timely and semantically accurate information delivery is critical in real-time wireless systems. While Age of Information (AoI) quantifies temporal freshness, Version Age of Information (VAoI) captures semantic staleness by accounting for version evolution between transmitters and receivers. Existing VAoI scheduling approaches primarily focus on minimizing average VAoI, overlooking rare but severe staleness events that can compromise reliability under stochastic packet arrivals and unreliable channels. This paper investigates both average-oriented and tail-risk-sensitive VAoI scheduling in a multi-user status update system with long-term transmission cost constraints. We first formulate the average VAoI minimization problem as a constrained Markov decision process and introduce a deep diffusion-based Soft Actor-Critic (D2SAC) algorithm. By generating actions through a diffusion-based denoising process, D2SAC enhances policy expressiveness and establishes a strong baseline for mean performance. Building on this foundation, we put forth RS-D3SAC, a risk-sensitive deep distributional diffusion-based Soft Actor-Critic algorithm. RS-D3SAC integrates a diffusion-based actor with a quantile-based distributional critic, explicitly modeling the full VAoI return distribution. This enables principled tail-risk optimization via Conditional Value-at-Risk (CVaR) while satisfying long-term transmission cost constraints. Extensive simulations show that, while D2SAC reduces average VAoI, RS-D3SAC consistently achieves substantial reductions in CVaR without sacrificing mean performance. The dominant gain in tail-risk reduction stems from the distributional critic, with the diffusion-based actor providing complementary refinement to stabilize and enrich policy decisions, highlighting their effectiveness for robust and risk-aware VAoI scheduling in multi-user wireless systems.

翻译：在实时无线系统中，确保信息传递的及时性与语义准确性至关重要。信息年龄（AoI）量化了时间新鲜度，而信息版本年龄（VAoI）则通过考虑发射端与接收端之间的版本演化来捕捉语义陈旧性。现有的VAoI调度方法主要集中于最小化平均VAoI，忽略了在随机数据包到达和不可靠信道下可能损害系统可靠性的罕见但严重的陈旧事件。本文研究了具有长期传输成本约束的多用户状态更新系统中的平均导向与尾部风险敏感VAoI调度问题。我们首先将平均VAoI最小化问题建模为一个约束马尔可夫决策过程，并提出了一种基于深度扩散的Soft Actor-Critic（D2SAC）算法。通过基于扩散的去噪过程生成动作，D2SAC增强了策略表达能力，并为平均性能建立了强基准。在此基础上，我们进一步提出了RS-D3SAC，一种风险敏感的基于深度分布扩散的Soft Actor-Critic算法。RS-D3SAC将基于扩散的执行器与基于分位数的分布评论器相结合，显式建模完整的VAoI回报分布。这使得通过条件风险价值（CVaR）进行原则性的尾部风险优化成为可能，同时满足长期传输成本约束。大量仿真结果表明，D2SAC能够降低平均VAoI，而RS-D3SAC在不牺牲平均性能的前提下，持续实现了CVaR的显著降低。尾部风险降低的主要增益源于分布评论器，而基于扩散的执行器则提供了互补的精细化调整，以稳定并丰富策略决策，凸显了它们对于多用户无线系统中鲁棒且风险感知的VAoI调度的有效性。