In this paper, we propose a source coding scheme that represents data from unknown distributions through frequency and support information. Existing encoding schemes often compress data by sacrificing computational efficiency or by assuming the data follows a known distribution. We take advantage of the structure that arises within the spatial representation and utilize it to encode run-lengths within this representation using Golomb coding. Through theoretical analysis, we show that our scheme yields an overall bit rate that nears entropy without a computationally complex encoding algorithm and verify these results through numerical experiments.
翻译:本文提出一种利用频率和支持信息对未知分布数据进行表示的信源编码方案。现有编码方案通常以牺牲计算效率或假设数据服从已知分布为代价进行数据压缩。我们利用空间表示中产生的结构特征,采用Golomb编码对该表示中的游程长度进行编码。理论分析表明,本方案无需复杂编码算法即可实现逼近熵率的整体比特率,并通过数值实验验证了理论结果。