Asymmetric Numeral Systems (ANS) is a class of entropy encoders that had an immense impact on the data compression, substituting arithmetic and Huffman coding. It was studied by different authors but the precise asymptotics of its redundancy (in relation to the entropy) was not completely understood. We obtain optimal bounds for the redundancy of the tabled ANS (tANS), the most popular ANS variant. Given a sequence $a_1,a_2,\ldots,a_n$ of symbols from an alphabet $\{0,1,\ldots,σ-1\}$ such that each symbol $a$ occurs in it $f_a$ times and $n=2^r$, the tANS encoder using Duda's ``precise initialization'' to fill tANS tables transforms this sequence into a bit string of the following length (the frequencies are not included in the encoding): $\sum\limits_{a\in[0..σ)}f_a\cdot\log\frac{n}{f_a}+O(σ+r)$, where $O(σ+r)$ can be bounded by $σ\log e+r$. The $r$-bit term is an artifact indispensable to ANS; the rest incurs a redundancy of $O(\fracσ{n})$ bits per symbol. We complement this by examples showing that an $Ω(σ+r)$ redundancy is necessary. We argue that similar examples exist for most adequate initialization methods for tANS. Thus, we refute Duda's conjecture that the redundancy is $O(\fracσ{n^2})$ bits per symbol. We also propose a variant of the range ANS (rANS), called rANS with fixed accuracy, parameterized by $k\ge 1$ that in certain conditions might be faster than the standard rANS because it avoids slow explicit division operations. We bound the redundancy for our rANS variant by $\frac{n}{2^k-1}\log e+r+k$.
翻译:非对称数字系统(ANS)是一类对数据压缩产生深远影响的熵编码器,它取代了算术编码和霍夫曼编码。尽管已有不同学者对其进行研究,但其冗余度(相对于熵)的精确渐近性质尚未被完全理解。本文针对最流行的ANS变体——表格化ANS(tANS),获得了其冗余度的最优界。给定字母表{0,1,…,σ-1}上的符号序列a₁,a₂,…,aₙ,其中每个符号a出现fₐ次且n=2ʳ,采用Duda“精确初始化”方法填充tANS表格的编码器将该序列转换为如下长度的比特串(编码中不包含频率信息):∑_{a∈[0..σ)} fₐ·log(n/fₐ) + O(σ+r),其中O(σ+r)可被σlog e + r界定。r比特项是ANS固有的必要特征;其余部分产生每符号O(σ/n)比特的冗余度。我们通过实例证明Ω(σ+r)的冗余度是必要的,从而补充了上述结论。我们论证类似实例存在于大多数合理的tANS初始化方法中,由此否定了Duda关于冗余度为每符号O(σ/n²)比特的猜想。此外,我们提出一种固定精度的区间ANS(rANS)变体,该变体以k≥1为参数,在特定条件下可能比标准rANS更快,因为它避免了耗时的显式除法运算。我们给出该rANS变体的冗余度上界为(n/(2ᵏ-1))log e + r + k。