This paper proposes a new lossless data compression coding scheme named an asymmetric encoding-decoding scheme (AEDS), which can be considered as a generalization of tANS (tabled variant of asymmetric numeral systems). In the AEDS, a data sequence $\mathbf{s}=s_1s_2\cdots s_n$ is encoded in backward order $s_t, t=n, \cdots, 2,1$, while $\mathbf{s}$ is decoded in forward order $s_t, t=1, 2, \cdots, n$ in the same way as the tANS. But, the code class of the AEDS is much broader than that of the tANS. We show for i.i.d.~sources that an AEDS with 2 states (resp.~5 states) can attain a shorter average code length than the Huffman code if a child of the root in the Huffman code tree has a probability weight larger than 0.61803 (resp.~0.56984). Furthermore, we derive several upper bounds on the average code length of the AEDS, which also hold for the tANS, and we show that the average code length of the optimal AEDS and tANS with $N$ states converges to the source entropy with speed $O(1/N)$ as $N$ increases.
翻译:本文提出了一种新的无损数据压缩编码方案,称为非对称编解码方案(AEDS),可视为tANS(表格化非对称数字系统)的推广。在AEDS中,数据序列 $\mathbf{s}=s_1s_2\cdots s_n$ 按逆序 $s_t, t=n, \cdots, 2,1$ 进行编码,而 $\mathbf{s}$ 则按顺序 $s_t, t=1, 2, \cdots, n$ 进行解码,其解码方式与tANS相同。然而,AEDS的编码类别比tANS广泛得多。我们证明,对于独立同分布信源,若霍夫曼编码树中根节点的某个子节点具有大于0.61803(对应2状态AEDS)或0.56984(对应5状态AEDS)的概率权重,则相应状态的AEDS能够获得比霍夫曼编码更短的平均码长。此外,我们推导了AEDS平均码长的若干上界,这些上界同样适用于tANS,并证明了当状态数 $N$ 增加时,最优AEDS与tANS的平均码长以 $O(1/N)$ 的速度收敛于信源熵。