Error-bounded lossy compression has been widely adopted in many scientific domains because it can address the challenges in storing, transferring, and analyzing the unprecedented amount of scientific data. Although error-bounded lossy compression offers general data distortion control by enforcing strict error bounds on raw data, they may fail to meet the quality requirements on the results of downstream analysis derived from raw data, a.k.a Quantities of Interest (QoIs). This may lead to uncertainties and even misinterpretations in scientific discoveries, significantly limiting the use of lossy compression in practice. In this paper, we propose QPET, a novel, versatile, and portable framework for QoI-preserving error-bounded lossy compression, which overcomes the challenges of modeling diverse QoIs by leveraging numerical strategies. QPET features (1) high portability to multiple existing lossy compressors, (2) versatile preservation to most differentiable univariate and multivariate QoIs, and (3) significant compression improvements in QoI-preservation tasks. Experiments with six real-world datasets demonstrate that QPET outperformed existing QoI-preserving compression framework in terms of speed, and integrating QPET into state-of-the-art error-bounded lossy compressors can gain up to 250% compression ratio improvements to original compressors and up to 75% compression ratio improvements to existing QoI-integrated scientific compressors. Under the same level of peak signal-to-noise ratios in the QoIs, QPET can improve the compression ratio by up to 102%.
翻译:有界误差有损压缩因其能够应对存储、传输和分析海量科学数据所面临的挑战,已在众多科学领域得到广泛应用。尽管有界误差有损压缩通过对原始数据施加严格的误差界来实现通用的数据失真控制,但其可能无法满足从原始数据推导出的下游分析结果(即感兴趣量)的质量要求。这可能导致科学发现的不确定性甚至误读,极大地限制了有损压缩在实际中的应用。本文提出了一种新颖、通用且便携的框架——QPET,用于实现感兴趣量保持型有界误差有损压缩。该框架通过利用数值策略,克服了对多样化感兴趣量进行建模的挑战。QPET具有以下特点:(1)对多种现有有损压缩器具有高可移植性;(2)能够通用地保持大多数可微分的单变量与多变量感兴趣量;(3)在感兴趣量保持任务中显著提升了压缩性能。在六个真实世界数据集上的实验表明,QPET在速度上优于现有的感兴趣量保持型压缩框架。将QPET集成到最先进的有界误差有损压缩器中,相比原始压缩器可获得高达250%的压缩比提升,相比现有的集成感兴趣量的科学压缩器可获得高达75%的压缩比提升。在感兴趣量峰值信噪比相同的情况下,QPET可将压缩比提升高达102%。