The growing volume of scientific simulation data presents a significant challenge for storage and transfer. Error-bounded lossy compression has emerged as a critical solution for mitigating these challenges, providing a means to reduce data size while ensuring that reconstructed data remains valid for scientific analysis. In this paper, we present a data-driven scientific data compressor, called Discontinuous Data-informed Local Subspaces (Discontinuous DLS), to improve compression-to-error ratios over data-agnostic compressors. This error-bounded compressor leverages localized spatial and temporal subspaces, informed by the underlying data structure, to enhance compression efficiency and preserve key features. The presented technique is flexible and applicable to a wide range of scientific data, including fluid dynamics, environmental simulations, and other high-dimensional, time-dependent datasets. We describe the core principles of the method and demonstrate its ability to significantly reduce storage requirements without compromising critical data fidelity. The technique is implemented in a distributed computing environment using MPI, and its performance is evaluated against state-of-the-art error-bounded compression methods in terms of compression ratio and reconstruction accuracy. This study highlights discontinuous DLS as a promising approach for large-scale scientific data compression in high-performance computing environments, providing a robust solution for managing the growing data demands of modern scientific simulations.
翻译:科学模拟数据量的不断增长对存储和传输提出了重大挑战。有界误差有损压缩已成为应对这些挑战的关键解决方案,它提供了一种在确保重建数据对科学分析仍然有效的同时减小数据规模的方法。本文提出了一种数据驱动的科学数据压缩器,称为不连续数据信息局部子空间(Discontinuous DLS),旨在提高相对于数据无关压缩器的压缩误差比。该有界误差压缩器利用由底层数据结构信息驱动的局部空间和时间子空间,以提升压缩效率并保留关键特征。所提出的技术具有灵活性,适用于广泛的科学数据,包括流体动力学、环境模拟以及其他高维、时间相关的数据集。我们阐述了该方法的核心原理,并证明了其在不损害关键数据保真度的前提下显著降低存储需求的能力。该技术在基于MPI的分布式计算环境中实现,并在压缩比和重建精度方面与最先进的有界误差压缩方法进行了性能评估。本研究强调了不连续DLS作为一种在高性能计算环境中处理大规模科学数据压缩的有前景的方法,为管理现代科学模拟日益增长的数据需求提供了一个稳健的解决方案。