Error-bounded lossy compression has been identified as a promising solution for significantly reducing scientific data volumes upon users' requirements on data distortion. For the existing scientific error-bounded lossy compressors, some of them (such as SPERR and FAZ) can reach fairly high compression ratios and some others (such as SZx, SZ, and ZFP) feature high compression speeds, but they rarely exhibit both high ratio and high speed meanwhile. In this paper, we propose HPEZ with newly-designed interpolations and quality-metric-driven auto-tuning, which features significantly improved compression quality upon the existing high-performance compressors, meanwhile being exceedingly faster than high-ratio compressors. The key contributions lie in the following points: (1) We develop a series of advanced techniques such as interpolation re-ordering, multi-dimensional interpolation, and natural cubic splines to significantly improve compression qualities with interpolation-based data prediction. (2) The auto-tuning module in HPEZ has been carefully designed with novel strategies, including but not limited to block-wise interpolation tuning, dynamic dimension freezing, and Lorenzo tuning. (3) We thoroughly evaluate HPEZ compared with many other compressors on six real-world scientific datasets. Experiments show that HPEZ outperforms other high-performance error-bounded lossy compressors in compression ratio by up to 140% under the same error bound, and by up to 360% under the same PSNR. In parallel data transfer experiments on the distributed database, HPEZ achieves a significant performance gain with up to 40% time cost reduction over the second-best compressor.
翻译:误差有界有损压缩已被认为是一种根据用户对数据失真的要求显著减少科学数据体积的有效解决方案。现有的科学误差有界有损压缩器中,部分(如SPERR和FAZ)能够达到较高的压缩比,另一些(如SZx、SZ和ZFP)则具有高压缩速度,但它们很少能同时兼顾高压缩比与高速度。本文提出HPEZ方法,采用新设计的插值技术和质量度量驱动的自动调谐机制,在显著提升现有高性能压缩器压缩质量的同时,其速度远超高压缩比压缩器。主要贡献体现在以下三点:(1)我们开发了一系列先进技术,包括插值重排序、多维插值和自然三次样条插值,通过基于插值的数据预测显著提升压缩质量。(2)HPEZ中的自动调谐模块采用新颖策略精心设计,包括但不限于分块插值调谐、动态维度冻结和Lorenzo调谐。(3)我们在六个真实科学数据集上对HPEZ与多种其他压缩器进行全面评估。实验表明,在相同误差界条件下,HPEZ的压缩比最高可超越其他高性能误差有界有损压缩器140%;在相同峰值信噪比(PSNR)条件下,最高可提升360%。在分布式数据库的并行数据传输实验中,HPEZ较次优压缩器最高可减少40%的时间开销,实现显著的性能提升。