As it is known, universal codes, which estimate the entropy rate consistently, exist for stationary ergodic sources over finite alphabets but not over countably infinite ones. We generalize universal coding as the problem of universal densities with respect to a fixed reference measure on a countably generated measurable space. We show that universal densities, which estimate the differential entropy rate consistently, exist for finite reference measures. Thus finite alphabets are not necessary in some sense. To exhibit a universal density, we adapt the non-parametric differential (NPD) entropy rate estimator by Feutrill and Roughan. Our modification is analogous to Ryabko's modification of prediction by partial matching (PPM) by Cleary and Witten. Whereas Ryabko considered a mixture over Markov orders, we consider a mixture over quantization levels. Moreover, we demonstrate that any universal density induces a strongly consistent Ces\`aro mean estimator of conditional density given an infinite past. This yields a universal predictor with the $0-1$ loss for a countable alphabet. Finally, we specialize universal densities to processes over natural numbers and on the real line. We derive sufficient conditions for consistent estimation of the entropy rate with respect to infinite reference measures in these domains.
翻译:众所周知,通用编码(能一致估计熵率)对于有限字母表上的平稳遍历信源存在,但对于可数无限字母表则不存在。我们将通用编码推广为在可数生成可测空间上相对于固定参考测度的通用密度问题。我们证明,对于有限参考测度,存在能一致估计微分熵率的通用密度。因此,从某种意义上说,有限字母表并非必要。为了展示通用密度,我们采用了Feutrill和Roughan提出的非参数微分(NPD)熵率估计器,并对其进行了修改。我们的修改类似于Ryabko对Cleary和Witten提出的部分匹配预测(PPM)的修改:Ryabko考虑的是马尔可夫阶数的混合,而我们考虑的是量化水平的混合。此外,我们证明任何通用密度都能导出基于无穷历史的条件密度的强一致Cesàro均值估计器。这为可数字母表上的0-1损失提供了通用预测器。最后,我们将通用密度具体应用于自然数过程和实线过程,并推导出在这些领域中关于无限参考测度的熵率一致估计的充分条件。