Data compression algorithms typically rely on identifying repeated sequences of symbols from the original data to provide a compact representation of the same information, while maintaining the ability to recover the original data from the compressed sequence. Using data transformations prior to the compression process has the potential to enhance the compression capabilities, being lossless as long as the transformation is invertible. Floating point data presents unique challenges to generate invertible transformations with high compression potential. This paper identifies key conditions for basic operations of floating point data that guarantee lossless transformations. Then, we show four methods that make use of these observations to deliver lossless compression of real datasets, where we improve compression rates up to 40 %.
翻译:数据压缩算法通常依赖于识别原始数据中重复出现的符号序列,从而以紧凑的形式呈现相同信息,同时保持从压缩序列中恢复原始数据的能力。在压缩过程之前使用数据变换能够增强压缩能力,只要变换是可逆的,即可实现无损压缩。浮点数据在生成具有高压缩潜力的可逆变换时面临独特挑战。本文确定了保证浮点数据基本操作无损的关键条件,随后展示了利用这些发现实现真实数据集无损压缩的四种方法,其中压缩率提升最高可达40%。