Scientific discoveries are increasingly constrained by limited storage space and I/O capacities. For time-series simulations and experiments, their data often need to be decimated over timesteps to accommodate storage and I/O limitations. In this paper, we propose a technique that addresses storage costs while improving post-analysis accuracy through spatiotemporal adaptive, error-controlled lossy compression. We investigate the trade-off between data precision and temporal output rates, revealing that reducing data precision and increasing timestep frequency lead to more accurate analysis outcomes. Additionally, we integrate spatiotemporal feature detection with data compression and demonstrate that performing adaptive error-bounded compression in higher dimensional space enables greater compression ratios, leveraging the error propagation theory of a transformation-based compressor. To evaluate our approach, we conduct experiments using the well-known E3SM climate simulation code and apply our method to compress variables used for cyclone tracking. Our results show a significant reduction in storage size while enhancing the quality of cyclone tracking analysis, both quantitatively and qualitatively, in comparison to the prevalent timestep decimation approach. Compared to three state-of-the-art lossy compressors lacking feature preservation capabilities, our adaptive compression framework improves perfectly matched cases in TC tracking by 26.4-51.3% at medium compression ratios and by 77.3-571.1% at large compression ratios, with a merely 5-11% computational overhead.
翻译:科学发现日益受限于有限的存储空间和I/O能力。对于时间序列模拟和实验,其数据常常需要在时间步上进行抽稀以适应存储和I/O限制。本文提出一种通过时空自适应、误差可控的有损压缩技术,在降低存储成本的同时提升后处理分析精度。我们研究了数据精度与时间输出速率之间的权衡关系,发现降低数据精度并提高时间步频次能得到更准确的分析结果。此外,我们将时空特征检测与数据压缩相结合,并证明在高维空间中进行自适应误差受限压缩能够通过基于变换的压缩器的误差传播理论实现更高压缩比。为评估方法,我们利用著名的E3SM气候模拟代码开展实验,将本方法应用于气旋追踪所需变量的压缩。结果表明,与常规的时间步抽稀方法相比,本方法在显著减少存储空间的同时,从定量和定性两方面均提升了气旋追踪分析质量。相较于三种缺乏保特征能力的最先进有损压缩器,本自适应压缩框架在中压缩比下将热带气旋追踪的完美匹配案例提高了26.4%-51.3%,在高压缩比下提高了77.3%-571.1%,而计算开销仅为5%-11%。