The amount of data generated and gathered in scientific simulations and data collection applications is continuously growing, putting mounting pressure on storage and bandwidth concerns. A means of reducing such issues is data compression; however, lossless data compression is typically ineffective when applied to floating-point data. Thus, users tend to apply a lossy data compressor, which allows for small deviations from the original data. It is essential to understand how the error from lossy compression impacts the accuracy of the data analytics. Thus, we must analyze not only the compression properties but the error as well. In this paper, we provide a statistical analysis of the error caused by ZFP compression, a state-of-the-art, lossy compression algorithm explicitly designed for floating-point data. We show that the error is indeed biased and propose simple modifications to the algorithm to neutralize the bias and further reduce the resulting error.
翻译:在科学模拟和数据收集应用中生成和收集的数据量持续增长,给存储和带宽带来了日益增大的压力。缓解这一问题的一种方法是数据压缩;然而,无损数据压缩在应用于浮点数据时通常效果不佳。因此,用户倾向于采用有损数据压缩器,它允许与原始数据存在微小偏差。理解有损压缩产生的误差如何影响数据分析的准确性至关重要。因此,我们不仅需要分析压缩特性,还必须分析误差。本文对ZFP压缩(一种专为浮点数据设计的最先进有损压缩算法)引起的误差进行了统计分析。我们证明了该误差确实存在偏差,并提出了对算法的简单修改以消除偏差,从而进一步减少最终误差。