We study the problem of zero-delay coding of a Markov source over a noisy channel with feedback. We first formulate the problem as a Markov decision process (MDP) where the state is a previous belief term along with a finite memory of channel outputs and quantizers. We then approximate this state by marginalizing over all possible beliefs, so that our policies only use the finite-memory term to encode the source. Under an appropriate notion of predictor stability, we show that such policies are near-optimal for the zero-delay coding problem as the memory length increases. We also give sufficient conditions for predictor stability to hold, and propose a reinforcement learning algorithm to compute near-optimal finite-memory policies. These theoretical results are supported by simulations.
翻译:我们研究了带反馈噪声信道中马尔可夫信源零时延编码问题。首先将该问题建模为马尔可夫决策过程(MDP),其中状态由先验置信项与信道输出及量化器的有限存储共同构成。随后通过对所有可能的置信项进行边缘化近似状态,使策略仅利用有限存储项进行信源编码。在适当的预测稳定性条件下,证明了此类策略随存储长度增加可达到零时延编码问题的近优性。给出了预测稳定性成立的充分条件,并提出一种强化学习算法来计算近优有限存储策略。仿真结果验证了上述理论结论。