With streaming floating-point numbers being increasingly prevalent, effective and efficient compression of such data is critical. Compression schemes must be able to exploit the similarity, or smoothness, of consecutive numbers and must be able to contend with extreme conditions, such as high-precision values or the absence of smoothness. We present DeXOR, a novel framework that enables decimal XOR procedure to encode decimal-space longest common prefixes and suffixes, achieving optimal prefix reuse and effective redundancy elimination. To ensure accurate and low-cost decompression even with binary-decimal conversion errors, DeXOR incorporates 1) scaled truncation with error-tolerant rounding and 2) different bit management strategies optimized for decimal XOR. Additionally, a robust exception handler enhances stability by managing floating-point exponents, maintaining high compression ratios under extreme conditions. In evaluations across 22 datasets, DeXOR surpasses state-of-the-art schemes, achieving a 15% higher compression ratio and a 20% faster decompression speed while maintaining a competitive compression speed. DeXOR also offers scalability under varying conditions and exhibits robustness in extreme scenarios where other schemes fail.
翻译:随着流式浮点数日益普及,对此类数据进行高效压缩变得至关重要。压缩方案必须能够利用连续数值间的相似性或平滑性,同时必须能够应对极端条件,例如高精度数值或平滑性缺失的情况。本文提出DeXOR这一创新框架,通过启用十进制异或运算来编码十进制空间的最长公共前缀与后缀,实现最优前缀复用和高效冗余消除。为确保即使在二进制-十进制转换误差下仍能实现精确且低成本的解压缩,DeXOR融合了:1)采用容错舍入的缩放截断技术;2)针对十进制异或运算优化的差异化比特管理策略。此外,通过管理浮点数指数的鲁棒异常处理器增强了系统稳定性,在极端条件下仍能保持高压缩率。在22个数据集上的评估表明,DeXOR优于现有最优方案,在保持竞争力的压缩速度的同时,实现了15%的压缩率提升和20%的解压缩速度提升。DeXOR在不同条件下具备良好的可扩展性,并在其他方案失效的极端场景中展现出卓越的鲁棒性。