Error-bounded lossy compression has been identified as a promising solution for significantly reducing scientific data volumes upon users' requirements on data distortion. For the existing scientific error-bounded lossy compressors, some of them (such as SPERR and FAZ) can reach fairly high compression ratios and some others (such as SZx, SZ, and ZFP) feature high compression speeds, but they rarely exhibit both high ratio and high speed meanwhile. In this paper, we propose HPEZ with newly-designed interpolations and quality-metric-driven auto-tuning, which features significantly improved compression quality upon the existing high-performance compressors, meanwhile being exceedingly faster than high-ratio compressors. The key contributions lie in the following points: (1) We develop a series of advanced techniques such as interpolation re-ordering, multi-dimensional interpolation, and natural cubic splines to significantly improve compression qualities with interpolation-based data prediction. (2) The auto-tuning module in HPEZ has been carefully designed with novel strategies, including but not limited to block-wise interpolation tuning, dynamic dimension freezing, and Lorenzo tuning. (3) We thoroughly evaluate HPEZ compared with many other compressors on six real-world scientific datasets. Experiments show that HPEZ outperforms other high-performance error-bounded lossy compressors in compression ratio by up to 140% under the same error bound, and by up to 360% under the same PSNR. In parallel data transfer experiments on the distributed database, HPEZ achieves a significant performance gain with up to 40% time cost reduction over the second-best compressor.
翻译:误差有界有损压缩被认定为一种能够根据用户对数据失真的需求显著缩减科学数据体量的有效方案。在现有科学误差有界有损压缩器中,部分算法(如SPERR和FAZ)可实现相当高的压缩比,另一些算法(如SZx、SZ和ZFP)则以高压缩速度著称,但鲜有算法能同时兼具高压缩比与高速度。本文提出HPEZ方法,其采用新设计的插值算法与质量指标驱动的自动调优技术,在显著提升现有高性能压缩器压缩质量的同时,压缩速度远超高压缩比压缩器。核心贡献包括:(1)我们开发了插值重排序、多维插值及自然三次样条等系列先进技术,通过基于插值的数据预测大幅提升压缩质量;(2)HPEZ中的自动调优模块采用创新策略精心设计,包含但不限于分块插值调优、动态维度冻结及Lorenzo调优;(3)我们基于六个真实科学数据集,将HPEZ与多种压缩器进行全面对比评估。实验表明:在相同误差界条件下,HPEZ的压缩比较其他高性能误差有界有损压缩器最高提升140%;在相同峰值信噪比条件下最高提升360%。在分布式数据库的并行数据传输实验中,HPEZ相较性能第二的压缩器实现了最高40%耗时缩减的显著性能增益。