Set Shaping Theory (SST) moves beyond the classical fixed-space model by constructing bijective mappings the original sequence set into structured regions of a larger sequence space. These shaped subsets are characterized by a reduced average information content, measured by the product of the empirical entropy and the length, yielding (N +k)H0(f(s)) < NH0(s), which represents the universal coding limit when the source distribution is unknown. The principal experimental difficulty in applying Set Shaping Theory to non-uniform sequences arises from the need to order the sequences of both the original and transformed sets according to their information content. An exact ordering of these sets entails exponential complexity, rendering a direct implementation impractical. In this article, we show that this obstacle can be overcome by performing an approximate but informative ordering that preserves the structural requirements of SST while achieving the shaping gain predicted by the theory. This result extends previous experimental findings obtained for uniformly distributed sequences and demonstrates that the shaping advantage of SST persists for non-uniform sequences. Finally, to ensure full reproducibility, the software implementing the proposed method has been made publicly available on GitHub, enabling independent verification of the results reported in this work
翻译:集合整形理论(SST)超越了经典的固定空间模型,通过构建从原始序列集合到更大序列空间结构化区域的双射映射来实现。这些整形后的子集以降低的平均信息含量为特征,其度量方式为经验熵与长度的乘积,满足 (N + k)H₀(f(s)) < NH₀(s) 这一关系式,该式代表了当信源分布未知时的通用编码极限。将集合整形理论应用于非均匀序列时,主要的实验困难源于需要根据信息含量对原始集合与变换后集合的序列进行排序。对这些集合进行精确排序具有指数级复杂度,使得直接实现不可行。本文证明,通过执行一种近似但信息保持的排序,可以在满足SST结构要求的同时实现理论预测的整形增益,从而克服这一障碍。该结果扩展了先前在均匀分布序列上获得的实验发现,并证明了SST的整形优势在非均匀序列中依然存在。最后,为确保完全可复现性,实现所提方法的软件已在GitHub上公开,可供对本工作中报告的结果进行独立验证。