Universal coders process individual sequences without assuming that the source distribution is known. In this setting, uniformly generated sequences represent the most difficult test case: the source simulates pure randomness, contains no exploitable bias, and forces a frequency-estimating universal coder to infer the empirical composition entirely from the sequence itself. This paper reports that a Set Shaping Theory (SST) transformation systematically reduces the average universal coding length of uniformly generated sequences below the Krichevsky-Trofimov baseline N H_0(s) + R_KT(s). The transformation maps each input sequence s in A^N into an expanded sequence f(s) in A^(N+1), while storing the transformation index in the additional symbol in order to preserve reversibility. The comparison evaluates the exact Krichevsky-Trofimov baseline N H_0(s) + R_KT(s) against the shaped score (N+1)H_0(f(s)) + R_KT(f(s)), where H_0 is computed from the empirical frequencies of the individual sequence. A single unified transformation also yields reductions across distinct compression architectures, including adaptive arithmetic coding, enumerative coding, LZ78, adaptive Huffman coding, and adaptive ANS. These results support the interpretation of SST as a representation-level preprocessing layer that can structurally improve existing universal coders without requiring internal modifications to their coding mechanisms. All results reported in the article can be reproduced with the simulator available at https://sst-simulator.github.io/Set-Shaping-Theory-Simulator/.
翻译:通用编码器在不假设信源分布已知的情况下处理单个序列。在此设定下,均匀生成的序列代表最困难的测试案例:信源模拟纯随机性,不包含任何可利用的偏差,迫使基于频率估计的通用编码器完全从序列本身推断经验组成。本文报告称,集合塑形理论(SST)变换系统性地降低了均匀生成序列的平均通用编码长度,使其低于Krichevsky-Trofimov基线N H_0(s) + R_KT(s)。该变换将输入序列s ∈ A^N映射为扩展序列f(s) ∈ A^(N+1),同时将变换索引存储在新增符号中以保持可逆性。比较评估了精确的Krichevsky-Trofimov基线N H_0(s) + R_KT(s)与塑形后的得分(N+1)H_0(f(s)) + R_KT(f(s)),其中H_0是根据单个序列的经验频率计算的。单一的统一变换还在不同的压缩架构(包括自适应算术编码、枚举编码、LZ78、自适应霍夫曼编码和自适应ANS)中实现了缩减。这些结果支持将SST解释为一种表示层预处理层,能够在不修改编码机制内部的情况下,从结构上改进现有的通用编码器。本文报告的所有结果可通过https://sst-simulator.github.io/Set-Shaping-Theory-Simulator/上的模拟器复现。