马尔科夫源非预期率失真函数的有限时域动态规划新分析 (A New Finite-Horizon Dynamic Programming Analysis of Nonanticipative Rate-Distortion Function for Markov Sources)

This paper deals with the computation of a non-asymptotic lower bound by means of the nonanticipative rate-distortion function (NRDF) on the discrete-time zero-delay variable-rate lossy compression problem for discrete Markov sources with per-stage, single-letter distortion. First, we derive a new information structure of the NRDF for Markov sources and single-letter distortions. Second, we derive new convexity results on the NRDF, which facilitate the use of Lagrange duality theorem to cast the problem as an unconstrained partially observable finite-time horizon stochastic dynamic programming (DP) algorithm subject to a probabilistic state (belief state) that summarizes the past information about the reproduction symbols and takes values in a continuous state space. Instead of approximating the DP algorithm directly, we use Karush-Kuhn-Tucker (KKT) conditions to find an implicit closed-form expression of the optimal control policy of the stochastic DP (i.e., the minimizing distribution of the NRDF) and approximate the control policy and the cost-to-go function (a function of the rate) stage-wise, via a novel dynamic alternating minimization (AM) approach, that is realized by an offline algorithm operating using backward recursions, with provable convergence guarantees. We obtain the clean values of the aforementioned quantities using an online (forward) algorithm operating for any finite-time horizon. Our methodology provides an approximate solution to the exact NRDF solution, which becomes near-optimal as the search space of the belief state becomes sufficiently large at each time stage. We corroborate our theoretical findings with simulation studies where we apply our algorithms assuming time-varying and time-invariant binary Markov processes.

翻译：本文针对具有逐阶段单字母失真的离散马尔科夫源，通过非预期率失真函数（NRDF）研究离散时间零延迟变速率有损压缩问题的非渐近下界计算。首先，我们推导了马尔科夫源在单字母失真约束下NRDF的新信息结构。其次，我们建立了NRDF的新凸性结果，该结果使得拉格朗日对偶定理能够将问题转化为受概率状态（信念状态）约束的无约束部分可观测有限时域随机动态规划（DP）算法，其中信念状态用于概括关于再生符号的历史信息，并在连续状态空间中取值。我们并未直接对DP算法进行近似，而是利用Karush-Kuhn-Tucker（KKT）条件推导随机DP最优控制策略（即NRDF最小化分布）的隐式闭式表达式，并通过一种新颖的动态交替最小化（AM）方法逐阶段近似控制策略与代价函数（关于速率的函数）。该方法通过采用后向递归的离线算法实现，并具有可证明的收敛保证。我们进一步通过适用于任意有限时域的在线（前向）算法获得上述量的精确值。本方法为精确NRDF解提供了近似解，当信念状态的搜索空间在各时间阶段充分大时，该近似解将趋近最优。我们通过仿真实验验证理论结果，在实验中分别对时变与时不变二元马尔科夫过程应用所提算法。