Recurrent neural networks maintain a hidden state $h_t$, but its probabilistic meaning is often unclear. We study hidden-state stability through \emph{backward coherence}: the extent to which $h_t$ can be reconstructed from $h_{t+1}$ by a learned backward projector $g_φ$. Under contraction and summable backward drift, the hidden-state sequence forms a quasi-reverse-martingale. This yields almost-sure convergence, rates under mixing, an interpretable limiting representation, finite pathwise stopping times, and a theoretical framework for time-uniform confidence sequences. Simulations support the theory. Backward-coherence regularisation reduces the empirical quasi-martingale total $\hat Q$ by $43$--$58%$, reaches stability $28$--$44%$ earlier than an unregularised RNN, and gives tracking-error recovery consistent with geometric bounds. Additional tests confirm echo-state forgetting rates bounded by $ρ$ and verify the increment-sum tube $R_t$ with $100%$ simultaneous coverage, although $R_t$ is conservative; in practice, the defect-tail proxy $\hat Q_t$ is the more useful monitor. The backward-coherence loss is also equivalent to minimising a Kullback--Leibler divergence in a Gaussian backward model, linking the method to variational inference. Extensions cover $φ$-mixing inputs, change-point tracking, and finite-sample concentration. Three real-data studies further validate the approach. On PhysioNet 2012 ICU data, the Reverse Martingale RNN (RMRNN) matches RNN mortality-prediction AUC while reaching stable representations 13 hours earlier. On FRED-MD, it reduces one-month-ahead forecast error by about fourfold under concept drift. On UCI Human Activity Recognition, it maintains lower post-transition tracking error with geometric decay. The guarantees apply under the stated assumptions; universality is not claimed.
翻译:循环神经网络维持一个隐状态 $h_t$,但其概率意义通常不明确。我们通过\emph{反向相干}(即通过学习的反向投影器 $g_φ$ 从 $h_{t+1}$ 重构 $h_t$ 的程度)研究隐状态稳定性。在压缩性和可累加反向漂移条件下,隐状态序列构成一个拟逆鞅。这保证了几乎必然收敛、混合条件下的收敛速率、可解释的极限表示、有限路径停止时间以及时间一致置信序列的理论框架。仿真结果支持该理论。反向相干正则化使经验拟鞅总变差 $\hat Q$ 降低 43--58%,使稳定性提前 28--44%(相较于未正则化 RNN),并实现符合几何界的跟踪误差恢复。额外测试证实了回声状态遗忘速率受 $\rho$ 约束,且增量累加管 $R_t$ 达到 100% 同时覆盖,但 $R_t$ 保守;实际中,缺陷尾代理 $\hat Q_t$ 是更实用的监测指标。反向相干损失等价于在高斯反向模型中最小化库尔贝克-莱布勒散度,将该方法与变分推断联系起来。扩展部分涵盖 $\phi$-混合输入、变点跟踪以及有限样本集中性。三项真实数据研究进一步验证了该方法。在 PhysioNet 2012 ICU 数据上,逆鞅 RNN(RMRNN)在保持与 RNN 相当的死亡率预测 AUC 的同时,提前 13 小时达到稳定表示。在 FRED-MD 上,它在概念漂移下将一个月超前预测误差降低了约四倍。在 UCI 人类活动识别上,它保持更低的后过渡跟踪误差,且误差呈几何衰减。这些保证在所述假设下成立;不宣称普适性。