In this paper, we present a new multi-scale information content calculation method based on Shannon information (and Shannon entropy). The original method described by Claude E. Shannon and based on the logarithm of the probability of elements gives an upper limit to the information content of discrete patterns, but in many cases (for example, in the case of repeating patterns) it is inaccurate and does not approximate the true information content of the pattern well enough. The new mathematical method presented here provides a more accurate estimate of the (internal) information content of any discrete pattern based on Shannon's original function. The method is tested on different data sets and the results are compared with the results of other methods like compression algorithms.
翻译:本文提出了一种基于香农信息(及香农熵)的新型多尺度信息含量计算方法。克劳德·E·香农提出的原始方法基于元素概率的对数,为离散模式的信息含量给出了上限,但在许多情况下(例如重复模式的情况下)该结果不够精确,且未能充分逼近模式的真实信息含量。本文提出的新数学方法基于香农原始函数,能够对任意离散模式的(内在)信息含量进行更精确的估算。该方法已在不同数据集上进行了测试,并将其结果与压缩算法等其他方法的结果进行了比较。