We study a family of processes generated according to sequential probability assignments induced by the LZ78 universal compressor. We characterize entropic and distributional properties such as their entropy and relative entropy rates, finite-state compressibility and log loss of their realizations, and the empirical distributions that they induce. Though not quite stationary, these sources are "almost stationary and ergodic;" similar to stationary and ergodic processes, they satisfy a Shannon-McMillan-Breiman-type property: the normalized log probability of their realizations converges almost surely to their entropy rate. Further, they are locally "almost i.i.d." in the sense that the finite-dimensional empirical distributions of their realizations converge almost surely to a deterministic i.i.d. law. However, unlike stationary ergodic sources, the finite-state compressibility of their realizations is almost surely strictly larger than their entropy rate by a "Jensen gap". We present simulations demonstrating the theoretical results. These sources allow to gauge the performance of sequential probability models, both classical and deep learning-based, on non-Markovian non-stationary data. As such, we apply realizations of the LZ78 source to the study of in-context learning in transformer models.
翻译:我们研究了一类由LZ78通用压缩器导出的序列概率分配所生成的随机过程。我们刻画了其熵与相对熵率、实现序列的有限状态可压缩性与对数损失、以及它们所诱导的经验分布等熵与分布特性。尽管这些信源并非完全平稳,但它们具有“近似平稳且遍历”的性质:类似于平稳遍历过程,它们满足香农-麦克米伦-布雷曼型性质——其实现序列的归一化对数概率几乎必然收敛于熵率。此外,它们在局部上“近似独立同分布”,即实现序列的有限维经验分布几乎必然收敛于一个确定性的独立同分布律。然而,与平稳遍历信源不同,其实现序列的有限状态可压缩性几乎必然严格大于熵率,其差值表现为“詹森间隙”。我们通过仿真实验验证了理论结果。这类信源可用于评估序列概率模型(包括经典方法与基于深度学习的方法)在非马尔可夫非平稳数据上的性能表现。基于此,我们将LZ78信源的实现序列应用于Transformer模型的情境学习研究。