Urban traffic management demands systems that simultaneously predict future conditions, detect anomalies, and take safe corrective actions -- all while providing reliability guarantees. We present STREAM-RL, a unified framework that introduces three novel algorithmic contributions: (1) PU-GAT+, an Uncertainty-Guided Adaptive Conformal Forecaster that uses prediction uncertainty to dynamically reweight graph attention via confidence-monotonic attention, achieving distribution-free coverage guarantees; (2) CRFN-BY, a Conformal Residual Flow Network that models uncertainty-normalized residuals via normalizing flows with Benjamini-Yekutieli FDR control under arbitrary dependence; and (3) LyCon-WRL+, an Uncertainty-Guided Safe World-Model RL agent with Lyapunov stability certificates, certified Lipschitz bounds, and uncertainty-propagated imagination rollouts. To our knowledge, this is the first framework to propagate calibrated uncertainty from forecasting through anomaly detection to safe policy learning with end-to-end theoretical guarantees. Experiments on multiple real-world traffic trajectory data demonstrate that STREAM-RL achieves 91.4\% coverage efficiency, controls FDR at 4.1\% under verified dependence, and improves safety rate to 95.2\% compared to 69\% for standard PPO while achieving higher reward, with 23ms end-to-end inference latency.
翻译:城市交通管理要求系统能够同时预测未来状况、检测异常并采取安全的纠正措施——所有这些都需提供可靠性保证。我们提出了STREAM-RL这一统一框架,该框架包含三项新颖的算法贡献:(1) PU-GAT+:一种不确定性引导的自适应共形预测器,通过置信度单调注意力机制利用预测不确定性动态调整图注意力权重,实现无分布覆盖保证;(2) CRFN-BY:一种共形残差流网络,通过标准化流建模不确定性归一化残差,并在任意依赖关系下实现Benjamini-Yekutieli错误发现率控制;(3) LyCon-WRL+:一种具有李雅普诺夫稳定性证明、经认证的利普希茨边界及不确定性传播想象推演的不确定性引导安全世界模型强化学习智能体。据我们所知,这是首个将校准不确定性从预测端经异常检测传播至安全策略学习、并具有端到端理论保证的框架。在多个真实世界交通轨迹数据上的实验表明,STREAM-RL实现了91.4%的覆盖效率,在已验证依赖关系下将错误发现率控制在4.1%,安全率提升至95.2%(标准PPO为69%),同时获得更高奖励,且端到端推理延迟仅为23毫秒。