Dynamic Quality Metric Oriented Error-bounded Lossy Compression for Scientific Datasets

With the ever-increasing execution scale of high performance computing (HPC) applications, vast amounts of data are being produced by scientific research every day. Error-bounded lossy compression has been considered a very promising solution to address the big-data issue for scientific applications because it can significantly reduce the data volume with low time cost meanwhile allowing users to control the compression errors with a specified error bound. The existing error-bounded lossy compressors, however, are all developed based on inflexible designs or compression pipelines, which cannot adapt to diverse compression quality requirements/metrics favored by different application users. In this paper, we propose a novel dynamic quality metric oriented error-bounded lossy compression framework, namely QoZ. The detailed contribution is three-fold. (1) We design a novel highly-parameterized multi-level interpolation-based data predictor, which can significantly improve the overall compression quality with the same compressed size. (2) We design the error-bounded lossy compression framework QoZ based on the adaptive predictor, which can auto-tune the critical parameters and optimize the compression result according to user-specified quality metrics during online compression. (3) We evaluate QoZ carefully by comparing its compression quality with multiple state-of-the-arts on various real-world scientific application datasets. Experiments show that, compared with the second-best lossy compressor, QoZ can achieve up to 70% compression ratio improvement under the same error bound, up to 150% compression ratio improvement under the same PSNR, or up to 270% compression ratio improvement under the same SSIM.

翻译：随着高性能计算（HPC）应用执行规模的不断增长，科学研究每天都会产生海量数据。有界误差有损压缩被认为是一种极具前景的解决科学应用大数据问题的方法，因为它能够以较低的时间成本显著减少数据量，同时允许用户通过指定的误差界限控制压缩误差。然而，现有的有界误差有损压缩器均基于固定的设计或压缩流程开发，无法适应不同应用用户偏好的多样化压缩质量需求/度量。本文提出了一种新颖的动态质量度量导向的有界误差有损压缩框架，称为QoZ。具体贡献包括三个方面：（1）我们设计了一种新颖的高参数化多层插值数据预测器，能够在相同压缩尺寸下显著提升整体压缩质量；（2）基于自适应预测器，我们构建了有界误差有损压缩框架QoZ，它能在在线压缩过程中自动调优关键参数，并根据用户指定的质量度量优化压缩结果；（3）我们通过将QoZ的压缩质量与多种现有最优方法在多个真实科学应用数据集上进行对比评估。实验表明，与次优有损压缩器相比，在相同误差界限下，QoZ的压缩比最多可提升70%；在相同峰值信噪比（PSNR）下，压缩比最多可提升150%；在相同结构相似性指数（SSIM）下，压缩比最多可提升270%。