Pareto-type finite-block optimality for source codes: a constrained Markov example

We study a Pareto-type notion of finite-block optimality for injective source codes, where two codes are compared through the full sequence of expected block lengths. As a concrete and fully analyzable test case, we revisit the four-symbol constrained Markov source introduced by Dalai and Leonardi in their "meaningful example'' on constrained-source decodability. For each admissible nonempty string $u=x_1^m \in \mathscr{A} \subset \mathscr{X}^+$, let $$ K(u):=-\log_2 \mathbb{P}(X_1^m=u) $$ denote its information cost. We construct a canonical injective binary mapping $C:\mathscr{A} \to \{0,1\}^+$ by ordering admissible strings by increasing $K(u)$, then by length and lexicographic order, and assigning binary strings in shortlex order. For the length-$n$ block $X_1^n$ we prove $$ \mathbb{E}[|C(X_1)|]=\tfrac32, \qquad \mathbb{E}[|C(X_1^n)|]<\tfrac32\,n\quad (n\ge 2). $$ Moreover, for every fixed $$ 0<c<\frac{\sqrt2}{18\sqrtπ} $$ we have $$ \mathbb{E}[|C(X_1^n)|]\le \tfrac32\,n-\frac{c}{\sqrt n} $$ for all sufficiently large $n$. Thus, for this source, the reversible Dalai-Leonardi code is not Pareto-optimal with respect to finite-block average length. The proof is based on an exact enumeration of admissible strings by information cost and on a shortlex gap identity implying that each cost class splits evenly between lengths $K(u)-1$ and $K(u)$. The example is simple, but it already exhibits the kind of finite-block Pareto comparison that seems natural for injective source coding under source constraints.

翻译：我们研究了单射信源编码的一种帕累托型有限块最优性概念，其中两种编码通过期望块长的完整序列进行比较。作为一个具体且可全面分析测试的案例，我们重新审视了Dalai和Leonardi在其关于约束信源可解码性的“有意义示例”中引入的四符号约束马尔可夫信源。对于每个非空容许字符串 $u=x_1^m \in \mathscr{A} \subset \mathscr{X}^+$，定义 $$K(u):=-\log_2 \mathbb{P}(X_1^m=u)$$ ，称其信息成本。我们通过按$K(u)$增序、再按字符串长度及词典序对容许字符串排序，并分配短词典序二进制字符串，构建了一个典范的单射二进制映射$C:\mathscr{A} \to \{0,1\}^+$。对于长度为$n$的块$X_1^n$，我们证明$$\mathbb{E}[|C(X_1)|]=\tfrac32, \qquad \mathbb{E}[|C(X_1^n)|]<\tfrac32\,n\quad (n\ge 2)$$。此外，对于任意固定$$0<c<\frac{\sqrt2}{18\sqrtπ}$$，当$n$充分大时，有$$\mathbb{E}[|C(X_1^n)|]\le \tfrac32\,n-\frac{c}{\sqrt n}$$。因此，对于该信源，可逆的Dalai-Leonardi码在有限块平均长度意义下不是帕累托最优的。该证明基于按信息成本对容许字符串的精确枚举，以及一个短词典序间隙恒等式，该恒等式表明每个成本类在长度$K(u)-1$和$K(u)$之间均匀分配。该示例虽然简单，但已体现了在信源约束下单射信源编码中似乎自然存在的有限块帕累托比较。