We study a family of processes generated according to sequential probability assignments induced by the LZ78 universal compressor. We characterize entropic and distributional properties such as their entropy and relative entropy rates, finite-state compressibility and log loss of their realizations, and the empirical distributions that they induce. Though not quite stationary, these sources are "almost stationary and ergodic;" similar to stationary and ergodic processes, they satisfy a Shannon-McMillan-Breiman-type property: the normalized log probability of their realizations converges almost surely to their entropy rate. Further, they are locally "almost i.i.d." in the sense that the finite-dimensional empirical distributions of their realizations converge almost surely to a deterministic i.i.d. law. However, unlike stationary ergodic sources, the finite-state compressibility of their realizations is almost surely strictly larger than their entropy rate by a "Jensen gap". We present simulations demonstrating the theoretical results. These sources allow to gauge the performance of sequential probability models, both classical and deep learning-based, on non-Markovian non-stationary data. As such, we apply realizations of the LZ78 source to the study of in-context learning in transformer models.
翻译:我们研究了一类由LZ78通用压缩器诱导的序贯概率分配所生成的过程。我们刻画了这些过程的熵与分布特性,包括其熵率与相对熵率、实现序列的有限状态压缩性及对数损失,以及它们所诱导的经验分布。尽管这些源并非严格平稳,但具有"近似平稳且遍历"的性质;类似于平稳遍历过程,它们满足一种香农-麦克米伦-布雷曼型性质:其实现序列的归一化对数概率几乎必然收敛于其熵率。此外,在局部意义上它们"近似独立同分布",即其实现序列的有限维经验分布几乎必然收敛于一个确定性的独立同分布律。然而,与平稳遍历源不同,其实现序列的有限状态压缩率几乎必然严格大于其熵率,二者之差呈现"詹森缺口"。我们通过仿真展示了这些理论结果。这些源可用于评估经典及基于深度学习的序贯概率模型在非马尔可夫非平稳数据上的性能。因此,我们将LZ78源的实现序列应用于Transformer模型中的上下文学习研究。