Stabilizing Fixed-Point Iteration for Markov Chain Poisson Equations

Poisson equations underpin average-reward reinforcement learning, but beyond ergodicity they can be ill-posed, meaning that solutions are non-unique and standard fixed point iterations can oscillate on reducible or periodic chains. We study finite-state Markov chains with $n$ states and transition matrix $P$. We show that all non-decaying modes are captured by a real peripheral invariant subspace $\mathcal{K}(P)$, and that the induced operator on the quotient space $\mathbb{R}^n/\mathcal{K}(P)$ is strictly contractive, yielding a unique quotient solution. Building on this viewpoint, we develop an end-to-end pipeline that learns the chain structure, estimates an anchor based gauge map, and runs projected stochastic approximation to estimate a gauge-fixed representative together with an associated peripheral residual. We prove $\widetilde{O}(T^{-1/2})$ convergence up to projection estimation error, enabling stable Poisson equation learning for multichain and periodic regimes with applications to performance evaluation of average-reward reinforcement learning beyond ergodicity.

翻译：泊松方程是平均奖励强化学习的理论基础，但在遍历性条件之外，该方程可能不适定，即解不唯一且标准不动点迭代在可约或周期性链上可能振荡。我们研究具有 $n$ 个状态和转移矩阵 $P$ 的有限状态马尔可夫链。我们证明所有非衰减模态均由实外围不变子空间 $\mathcal{K}(P)$ 捕获，且商空间 $\mathbb{R}^n/\mathcal{K}(P)$ 上的诱导算子是严格压缩的，从而产生唯一的商解。基于这一观点，我们开发了一个端到端流程：学习链结构、估计基于锚点的规范映射，并运行投影随机逼近以估计规范固定代表元及其相关的外围残差。我们证明了在投影估计误差范围内具有 $\widetilde{O}(T^{-1/2})$ 收敛速度，从而为多链和周期性机制实现了稳定的泊松方程学习，并将其应用于超越遍历性的平均奖励强化学习性能评估。

相关内容

马尔可夫链

关注 289

马尔可夫链，因安德烈·马尔可夫（A.A.Markov，1856－1922）得名，是指数学中具有马尔可夫性质的离散事件随机过程。该过程中，在给定当前知识或信息的情况下，过去（即当前以前的历史状态）对于预测将来（即当前以后的未来状态）是无关的。在马尔可夫链的每一步，系统根据概率分布，可以从一个状态变到另一个状态，也可以保持当前状态。状态的改变叫做转移，与不同的状态改变相关的概率叫做转移概率。随机漫步就是马尔可夫链的例子。随机漫步中每一步的状态是在图形中的点，每一步可以移动到任何一个相邻的点，在这里移动到每一个点的概率都是相同的（无论之前漫步路径是如何的）。

【剑桥博士论文】可扩展高斯过程：迭代方法与路径条件的进展

专知会员服务

16+阅读 · 2025年7月10日

【CMU博士论文】稳定模型与时序差分学习, 97页pdf

专知会员服务

31+阅读 · 2023年8月25日

【CMU博士论文】稳定模型与时间差分学习，97页pdf

专知会员服务

24+阅读 · 2023年6月17日

【伯克利博士论文】不确定性序列决策:最优性保证，组合学习，以及在机器人技术和生态学中的应用，256页pdf

专知会员服务

39+阅读 · 2023年5月17日