多元马尔可夫链的信息论极小极大与子模优化算法 (Information-theoretic minimax and submodular optimization algorithms for multivariate Markov chains)

We study an information-theoretic minimax problem for finite multivariate Markov chains on $d$-dimensional product state spaces. Given a family $\mathcal B=\{P_1,\ldots,P_n\}$ of $π$-stationary transition matrices and a class $\mathcal F = \mathcal{F}(\mathbf{S})$ of factorizable models induced by a partition $\mathbf S$ of the coordinate set $[d]$, we seek to minimize the worst-case information loss by analyzing $$\min_{Q\in\mathcal F}\max_{P\in\mathcal B} D_{\mathrm{KL}}^π(P\|Q),$$ where $D_{\mathrm{KL}}^π(P\|Q)$ is the $π$-weighted KL divergence from $Q$ to $P$. We recast the above minimax problem into concave maximization over the $n$-probability-simplex via strong duality and Pythagorean identities that we derive. This leads us to formulate an information-theoretic game and show that a mixed strategy Nash equilibrium always exists; and propose a projected subgradient algorithm to approximately solve the minimax problem with provable guarantee. By transforming the minimax problem into an orthant submodular function in $\mathbf{S}$, this motivates us to consider a max-min-max submodular optimization problem and investigate a two-layer subgradient-greedy procedure to approximately solve this generalization. Numerical experiments for Markov chains on the Curie-Weiss and Bernoulli-Laplace models illustrate the practicality of these proposed algorithms and reveals sparse optimal structures in these examples.

翻译：我们研究了定义在$d$维乘积状态空间上的有限多元马尔可夫链的一个信息论极小极大问题。给定一族$π$-平稳转移矩阵$\mathcal B=\{P_1,\ldots,P_n\}$，以及由坐标集$[d]$的一个划分$\mathbf S$所诱导的可分解模型类$\mathcal F = \mathcal{F}(\mathbf{S})$，我们通过分析$$\min_{Q\in\mathcal F}\max_{P\in\mathcal B} D_{\mathrm{KL}}^π(P\|Q)$$来寻求最小化最坏情况下的信息损失，其中$D_{\mathrm{KL}}^π(P\|Q)$是从$Q$到$P$的$π$-加权KL散度。我们通过推导出的强对偶性和毕达哥拉斯恒等式，将上述极小极大问题重新表述为在$n$-概率单纯形上的凹最大化问题。这引导我们构建了一个信息论博弈，并证明混合策略纳什均衡总是存在；同时提出了一种投影次梯度算法来近似求解该极小极大问题，并提供了可证明的保证。通过将极小极大问题转化为$\mathbf{S}$上的一个卦限子模函数，这促使我们考虑一个最大-最小-最大子模优化问题，并研究了一种双层次梯度-贪婪程序来近似求解这一推广问题。针对Curie-Weiss模型和Bernoulli-Laplace模型上的马尔可夫链进行的数值实验，说明了这些所提算法的实用性，并揭示了这些示例中稀疏的最优结构。