Today's scientific simulations require significant data volume reduction because of the enormous amounts of data produced and the limited I/O bandwidth and storage space. Error-bounded lossy compression has been considered one of the most effective solutions to the above problem. However, little work has been done to improve error-bounded lossy compression for Adaptive Mesh Refinement (AMR) simulation data. Unlike the previous work that only leverages 1D compression, in this work, we propose an approach (TAC) to leverage high-dimensional SZ compression for each refinement level of AMR data. To remove the data redundancy across different levels, we propose several pre-process strategies and adaptively use them based on the data features. We further optimize TAC to TAC+ by improving the lossless encoding stage of SZ compression to handle many small AMR data blocks after the pre-processing efficiently. Experiments on 10 AMR datasets from three real-world large-scale AMR simulations demonstrate that TAC+ can improve the compression ratio by up to 4.9$\times$ under the same data distortion, compared to the state-of-the-art method. In addition, we leverage the flexibility of our approach to tune the error bound for each level, which achieves much lower data distortion on two application-specific metrics.
翻译:当前的科学模拟因产生海量数据而受限于有限的I/O带宽和存储空间,亟需显著的数据量缩减技术。误差有界有损压缩被认为是解决上述问题的最有效方案之一。然而,针对自适应网格细化(AMR)模拟数据的误差有界有损压缩优化研究尚属空白。不同于先前仅利用一维压缩的工作,本研究提出了一种方法(TAC),对AMR数据的每个细化层级采用高维SZ压缩。为消除跨层级的数据冗余,我们提出了多种预处理策略,并根据数据特征自适应选用。通过改进SZ压缩的无损编码阶段以高效处理预处理后产生的大量小型AMR数据块,我们将TAC进一步优化为TAC+。在来自三个真实大规模AMR模拟的10个AMR数据集上的实验表明,与现有最优方法相比,TAC+在相同数据失真条件下压缩比最高提升4.9倍。此外,我们利用该方法的灵活性为每个层级调节误差界,从而在两项应用特定指标上实现了更低的数据失真。