Lossy data compression lies at the heart of modern communication and storage systems. Shannon's rate-distortion theory provides the fundamental limit on how much a source can be compressed at a given fidelity, but it assumes infinitely long block lengths that are never realized in practice. We present a self-contained tutorial on rate-distortion theory for the simplest non-trivial source: a Bernoulli$(p)$ sequence with Hamming distortion. We derive the classical rate-distortion function $RD = Hp - HD$ from first principles, illustrate its computation via the Blahut-Arimoto algorithm, and then develop the finite block length refinements that characterize how the minimum achievable rate approaches the Shannon limit as the block length $n$ grows. The central quantity in this refinement is the \emph{rate-distortion dispersion} $V(D)$, which governs the $O(1/\sqrt{n})$ penalty for operating at finite block lengths. We accompany all theoretical developments with numerical examples and figures generated by accompanying Python scripts.
翻译:有损数据压缩是现代通信与存储系统的核心技术。香农的率失真理论揭示了在给定保真度下信源压缩的根本极限,但其假设的无限长块长在实践中无法实现。本文针对最简单的非平凡信源——具有汉明失真的伯努利$(p)$序列,提供了一份自成体系的率失真理论教程。我们从基本原理推导出经典率失真函数 $RD = Hp - HD$,通过Blahut-Arimoto算法演示其计算方法,进而建立有限块长修正理论以刻画最小可达速率随块长$n$增大逼近香农极限的过程。该修正理论的核心量是控制有限块长操作中$O(1/\sqrt{n})$性能损失的\emph{率失真弥散}$V(D)$。所有理论推导均辅以通过Python脚本生成的数值算例与图示。