Improving the convergence of Markov chains via permutations and projections

This paper aims at improving the convergence to equilibrium of finite ergodic Markov chains via permutations and projections. First, we prove that a specific mixture of permuted Markov chains arises naturally as a projection under the KL divergence or the squared-Frobenius norm. We then compare various mixing properties of the mixture with other competing Markov chain samplers and demonstrate that it enjoys improved convergence. This geometric perspective motivates us to propose samplers based on alternating projections to combine different permutations and to analyze their rate of convergence. We give necessary, and under some additional assumptions also sufficient, conditions for the projection to achieve stationarity in the limit in terms of the trace of the transition matrix. We proceed to discuss tuning strategies of the projection samplers when these permutations are viewed as parameters. Along the way, we reveal connections between the mixture and a Markov chain Sylvester's equation as well as assignment problems, and highlight how these can be used to understand and improve Markov chain mixing. We provide two examples as illustrations. In the first example, the projection sampler (with a suitable choice of the permutation) improves upon Metropolis-Hastings in a discrete bimodal distribution with a reduced relaxation time from exponential to polynomial in the system size, while in the second example, the mixture of permuted Markov chain yields a mixing time that is logarithmic in system size (with high probability under random permutation), compared to a linear mixing time in the Diaconis-Holmes-Neal sampler.

翻译：本文旨在通过置换与投影提升有限遍历马尔可夫链向平衡态的收敛速度。首先，我们证明在KL散度或平方Frobenius范数下，一类特定的置换马尔可夫链混合自然地表现为投影形式。随后，我们将该混合链与其他马尔可夫链采样器在多种混合性质上进行比较，并证明其具有更优的收敛性能。这一几何视角启发我们提出基于交替投影的采样器，以融合不同置换方式并分析其收敛速率。我们给出了投影在极限条件下达到平稳态的必要条件，并在附加假设下给出以转移矩阵迹表示的充分条件。进一步，我们探讨了当这些置换被视为参数时投影采样器的调优策略。在此过程中，我们揭示了该混合链与马尔可夫链Sylvester方程及分配问题之间的联系，并阐明如何利用这些联系理解与改进马尔可夫链的混合行为。我们提供了两个示例作为说明：在第一个示例中，投影采样器（通过选择合适的置换）在离散双峰分布上优于Metropolis-Hastings算法，其弛豫时间从系统规模的指数级降至多项式级；而在第二个示例中，置换马尔可夫链的混合时间（在随机置换下以高概率）达到系统规模的对数级，而Diaconis-Holmes-Neal采样器的混合时间仅为线性级。