Modeling Information Blackouts in Missing Not-At-Random Time Series Data

Large-scale traffic forecasting relies on fixed sensor networks that often exhibit blackouts: contiguous intervals of missing measurements caused by detector or communication failures. These outages are typically handled under a Missing At Random (MAR) assumption, even though blackout events may correlate with unobserved traffic conditions (e.g., congestion or anomalous flow), motivating a Missing Not At Random (MNAR) treatment. We propose a latent state-space framework that jointly models (i) traffic dynamics via a linear dynamical system and (ii) sensor dropout via a Bernoulli observation channel whose probability depends on the latent traffic state. Inference uses an Extended Kalman Filter with Rauch-Tung-Striebel smoothing, and parameters are learned via an approximate EM procedure with a dedicated update for detector-specific missingness parameters. On the Seattle inductive loop detector data, introducing latent dynamics yields large gains over naive baselines, reducing blackout imputation RMSE from 7.02 (LOCF) and 5.02 (linear interpolation + seasonal naive) to 4.23 (MAR LDS), corresponding to about a 64% reduction in MSE relative to LOCF. Explicit MNAR modeling provides a consistent but smaller additional improvement on real data (imputation RMSE 4.20; 0.8% RMSE reduction relative to MAR), with similar modest gains for short-horizon post-blackout forecasts (evaluated at 1, 3, and 6 steps). In controlled synthetic experiments, the MNAR advantage increases as the true missingness dependence on latent state strengthens. Overall, temporal dynamics dominate performance, while MNAR modeling offers a principled refinement that becomes most valuable when missingness is genuinely informative.

翻译：大规模交通预测依赖于固定的传感器网络，这些网络经常出现中断：由检测器或通信故障引起的连续测量缺失区间。这些中断通常在缺失完全随机（MAR）假设下处理，尽管中断事件可能与未观测到的交通状况（例如拥堵或异常流量）相关，这促使我们采用缺失非随机（MNAR）处理方法。我们提出了一个潜在状态空间框架，该框架联合建模：（i）通过线性动态系统建模交通动态，以及（ii）通过伯努利观测通道建模传感器丢失，其概率取决于潜在交通状态。推断使用带有Rauch-Tung-Striebel平滑的扩展卡尔曼滤波器，参数通过近似EM过程学习，并配有针对特定检测器缺失参数的专用更新。在西雅图感应线圈检测器数据上，引入潜在动态相比简单基线方法带来了显著提升，将中断插补的均方根误差从7.02（LOCF）和5.02（线性插值+季节性朴素法）降低到4.23（MAR LDS），相当于相对于LOCF的均方误差降低了约64%。显式的MNAR建模在真实数据上提供了一致但较小的额外改进（插补均方根误差4.20；相对于MAR的均方根误差降低0.8%），对于短时域中断后预测（在1、3和6步进行评估）也显示出类似的适度提升。在受控的合成实验中，随着真实缺失对潜在状态的依赖性增强，MNAR方法的优势也随之增加。总体而言，时间动态主导了性能，而MNAR建模提供了一种原则性的改进，当缺失确实具有信息性时，其价值最为显著。