Compression is a crucial solution for data reduction in modern scientific applications due to the exponential growth of data from simulations, experiments, and observations. Compression with progressive retrieval capability allows users to access coarse approximations of data quickly and then incrementally refine these approximations to higher fidelity. Existing progressive compression solutions suffer from low reduction ratios or high operation costs, effectively undermining the approach's benefits. In this paper, we propose the first-ever interpolation-based progressive lossy compression solution that has both high reduction ratios and low operation costs. The interpolation-based algorithm has been verified as one of the best for scientific data reduction, but previously no effort exists to make it support progressive retrieval. Our contributions are three-fold: (1) We thoroughly analyze the error characteristics of the interpolation algorithm and propose our solution IPComp with multi-level bitplane and predictive coding. (2) We derive optimized strategies toward minimum data retrieval under different fidelity levels indicated by users through error bounds and bitrates. (3) We evaluate the proposed solution using six real-world datasets from four diverse domains. Experimental results demonstrate our solution archives up to $487\%$ higher compression ratios and $698\%$ faster speed than other state-of-the-art progressive compressors, and reduces the data volume for retrieval by up to $83\%$ compared to baselines under the same error bound, and reduces the error by up to $99\%$ under the same bitrate.
翻译:在现代科学应用中,由于模拟、实验和观测产生的数据呈指数级增长,压缩成为数据缩减的关键解决方案。具备渐进检索能力的压缩允许用户快速获取数据的粗略近似,随后逐步将这些近似提升至更高保真度。现有的渐进式压缩方案存在缩减率低或操作成本高的问题,实质上削弱了该方法的优势。本文提出了首个基于插值的渐进式有损压缩解决方案,兼具高缩减率和低操作成本。基于插值的算法已被验证为科学数据缩减的最佳方法之一,但此前尚未有研究使其支持渐进检索。我们的贡献包括三个方面:(1) 我们深入分析了插值算法的误差特性,提出了结合多层级比特平面与预测编码的解决方案IPComp。(2) 我们推导出针对用户通过误差界和比特率指定的不同保真度级别下实现最小数据检索的优化策略。(3) 我们使用来自四个不同领域的六个真实世界数据集评估所提出的方案。实验结果表明,相较于其他最先进的渐进式压缩器,我们的方案实现了高达$487\%$的压缩比提升和$698\%$的速度提升;在相同误差界下,与基线方法相比,检索数据量减少了高达$83\%$;在相同比特率下,误差降低了高达$99\%$。