Efficiency of ANS Entropy Encoders

Asymmetric Numeral Systems (ANS) is a class of entropy encoders that had an immense impact on the data compression, substituting arithmetic and Huffman coding. It was studied by different authors but the precise asymptotics of its redundancy (in relation to the entropy) was not completely understood. We obtain optimal bounds for the redundancy of the tabled ANS (tANS), the most popular ANS variant. Given a sequence $a_1,a_2,\ldots,a_n$ of symbols from an alphabet $\{0,1,\ldots,\sigma-1\}$ such that each symbol $a$ occurs in it $f_a$ times and $n=2^r$, the tANS encoder using Duda's ``precise initialization'' to fill tANS tables transforms this sequence into a bit string of the following length (the frequencies are not included in the encoding): $\sum\limits_{a\in[0..\sigma)}f_a\cdot\log\frac{n}{f_a}+O(\sigma+r)$, where $O(\sigma+r)$ can be bounded by $\sigma\log e+r$. The $r$-bit term is an artifact indispensable to ANS; the rest incurs a redundancy of $O(\frac{\sigma}{n})$ bits per symbol. We complement this by examples showing that an $\Omega(\sigma+r)$ redundancy is necessary. We argue that similar examples exist for most adequate initialization methods for tANS. Thus, we refute Duda's conjecture that the redundancy is $O(\frac{\sigma}{n^2})$ bits per symbol. We also propose a variant of the range ANS (rANS), called rANS with fixed accuracy, parameterized by $k\ge 1$. In this variant the integer division, which is unavoidable in rANS, is performed only when its result belongs to $[2^k..2^{k+1})$. Therefore, the division can be computed by faster methods provided $k$ is small. We bound the redundancy for our rANS variant by $\frac{n}{2^k-1}\log e+r$.

翻译：非对称数字系统（ANS）是一类对数据压缩领域产生重大影响的熵编码器，已取代算术编码和霍夫曼编码。不同学者对其进行了研究，但其冗余度（相对于熵）的精确渐近性质尚未被完全理解。本文针对最流行的ANS变体——表格化ANS（tANS）的冗余度获得了最优界。给定字母表{0,1,…,σ-1}上的符号序列a₁,a₂,…,aₙ（每个符号a出现fₐ次，且n=2ʳ），采用Duda“精确初始化”方法填充tANS表格的编码器将该序列转换为如下长度的比特串（编码中不包含频率信息）：∑_{a∈[0..σ)} fₐ·log(n/fₐ) + O(σ+r)，其中O(σ+r)可被σlog e + r界定。r比特项是ANS固有的必要特性；其余部分产生每符号O(σ/n)比特的冗余度。我们通过实例证明Ω(σ+r)的冗余度是必要的，从而补充了上述结论。论证表明类似实例存在于大多数合理的tANS初始化方法中。由此，我们否定了Duda关于冗余度为每符号O(σ/n²)比特的猜想。此外，我们提出一种区间ANS（rANS）的变体——固定精度rANS，其参数k≥1。该变体中仅当整数除法结果属于[2ᵏ..2ᵏ⁺¹)时才执行rANS必需的除法运算，因此当k较小时可通过更快方法计算除法。我们将该rANS变体的冗余度界定为(n/(2ᵏ-1))log e + r。