In the classical lossy source coding problem, one encodes long blocks of source symbols that enables the distortion to approach the ultimate Shannon limit. Such a block-coding approach introduces large delays, which is undesirable in many delay-sensitive applications. We consider the zero-delay case, where the goal is to encode and decode a finite-alphabet Markov source without any delay. It has been shown that this problem lends itself to stochastic control techniques, which lead to existence, structural, and general structural approximation results. However, these techniques so far have resulted only in computationally prohibitive algorithmic implementations for code design. To address this problem, we present a reinforcement learning design algorithm and rigorously prove its asymptotic optimality. In particular, we show that a quantized Q-learning algorithm can be used to obtain a near-optimal coding policy for this problem. The proof builds on recent results on quantized Q-learning for weakly Feller controlled Markov chains whose application necessitates the development of supporting technical results on regularity and stability properties, and relating the optimal solutions for discounted and average cost infinite horizon criteria problems. These theoretical results are supported by simulations.
翻译:在经典的失真源编码问题中,通常对长块源符号进行编码,使失真逼近最终的香农极限。这种块编码方法会引入较大延迟,这在许多延迟敏感的应用中是不利的。我们考虑零延迟情形,其目标是在无任何延迟的条件下对有限字母表马尔可夫源进行编码和解码。已有研究表明,该问题适用于随机控制技术,从而导出了存在性、结构特性及一般结构逼近结果。然而,迄今为止这些技术仅能产生计算上不可行的码设计算法实现。为解决此问题,我们提出了一种强化学习设计算法,并严格证明了其渐近最优性。具体而言,我们证明了量化Q学习算法可用于获得该问题的近最优编码策略。证明建立在近期关于弱Feller受控马尔可链的量化Q学习结果之上,其应用需要发展关于正则性与稳定性性质的辅助技术结果,并建立折扣与平均代价无限时域准则问题最优解之间的联系。这些理论结果得到了仿真实验的支持。