Compression is a crucial solution for data reduction in modern scientific applications due to the exponential growth of data from simulations, experiments, and observations. Compression with progressive retrieval capability allows users to access coarse approximations of data quickly and then incrementally refine these approximations to higher fidelity. Existing progressive compression solutions suffer from low reduction ratios or high operation costs, effectively undermining the approach's benefits. In this paper, we propose the first-ever interpolation-based progressive lossy compression solution that has both high reduction ratios and low operation costs. The interpolation-based algorithm has been verified as one of the best for scientific data reduction, but previously no effort exists to make it support progressive retrieval. Our contributions are three-fold: (1) We thoroughly analyze the error characteristics of the interpolation algorithm and propose our solution IPComp with multi-level bitplane and predictive coding. (2) We derive optimized strategies toward minimum data retrieval under different fidelity levels indicated by users through error bounds and bitrates. (3) We evaluate the proposed solution using six real-world datasets from four diverse domains. Experimental results demonstrate our solution archives up to $487\%$ higher compression ratios and $698\%$ faster speed than other state-of-the-art progressive compressors, and reduces the data volume for retrieval by up to $83\%$ compared to baselines under the same error bound, and reduces the error by up to $99\%$ under the same bitrate.
翻译:压缩是现代科学应用中数据缩减的关键解决方案,因为来自模拟、实验和观测的数据呈指数级增长。具有渐进式检索能力的压缩允许用户快速访问数据的粗略近似,然后逐步将这些近似细化至更高保真度。现有的渐进式压缩解决方案存在缩减率低或操作成本高的问题,这实际上削弱了该方法的优势。在本文中,我们提出了首个基于插值的渐进式有损压缩解决方案,该方案同时具有高缩减率和低操作成本。基于插值的算法已被验证是科学数据缩减的最佳方法之一,但此前尚未有努力使其支持渐进式检索。我们的贡献有三方面:(1) 我们深入分析了插值算法的误差特性,并提出了采用多级比特平面和预测编码的解决方案IPComp。(2) 我们推导了针对用户通过误差界和比特率指定的不同保真度级别下最小数据检索的优化策略。(3) 我们使用来自四个不同领域的六个真实世界数据集评估了所提出的解决方案。实验结果表明,与其他最先进的渐进式压缩器相比,我们的解决方案实现了高达$487\%$的更高压缩比和$698\%$的更快速度;在相同误差界下,与基线相比,检索数据量减少了高达$83\%$;在相同比特率下,误差减少了高达$99\%$。