We study a paradigm of coding for compression of the natural numbers via the zeta distribution and develop a statistical-mechanical interpretation, both in terms of Hagedorn systems and a Bose gas with energy levels given by logarithms of prime numbers. We also propose a simple coding scheme for the zeta distribution that nearly achieves the ideal code length. For block coding of vectors of natural numbers, we derive the micro-canonical entropy function and demonstrate its asymptotic linearity implying that its behavior is analogous to that of a Hagedorn system. We also derive the large deviations rate function, and provide a formula for the best coding parameter in the large deviations sense. We show that due the Hagedorn-type phase transition there is only partial equivalence of ensembles, due to the degeneration of the domain of the partition function.
翻译:我们研究了一种基于zeta分布对自然数进行压缩编码的范式,并发展了统计力学解释:一方面涉及哈格多恩系统,另一方面涉及以素数对数作为能级的玻色气体。我们还提出了一种接近理想码长的zeta分布简单编码方案。针对自然数向量的分组编码,我们推导了微正则熵函数,证明其渐近线性行为与哈格多恩系统类似。进一步地,我们推导了大偏差速率函数,并给出了大偏差意义下的最优编码参数公式。研究表明,由于哈格多恩型相变的存在,配分函数定义域退化导致系综仅具有部分等价性。