This paper presents error-bounded lossy compression tailored for particle datasets from diverse scientific applications in cosmology, fluid dynamics, and fusion energy sciences. As today's high-performance computing capabilities advance, these datasets often reach trillions of points, posing significant visualization, analysis, and storage challenges. While error-bounded lossy compression makes it possible to represent floating-point values with strict pointwise accuracy guarantees, the lack of correlations in particle data's storage ordering often limits the compression ratio. Inspired by quantization-encoding schemes in SZ lossy compressors, we dynamically determine the number of bits to encode particles of the dataset to increase the compression ratio. Specifically, we utilize a k-d tree to partition particles into subregions and generate ``bit boxes'' centered at particles for each subregion to encode their positions. These bit boxes ensure error control while reducing the bit count used for compression. We comprehensively evaluate our method against state-of-the-art compressors on cosmology, fluid dynamics, and fusion plasma datasets.
翻译:本文提出一种针对宇宙学、流体动力学及聚变能源科学领域不同科学应用中粒子数据的误差有界保真压缩方法。随着当前高性能计算能力的提升,这些数据集通常包含数万亿个点,给可视化、分析与存储带来了重大挑战。尽管误差有界保真压缩能够以严格的逐点精度保证来表示浮点数值,但粒子数据存储顺序中缺乏相关性往往限制了压缩比。受SZ保真压缩器中量化-编码方案的启发,我们动态确定编码数据集粒子所需的位数以提高压缩比。具体而言,我们利用k-d树将粒子划分为子区域,并为每个子区域生成以粒子为中心的"位盒子"来编码其位置。这些位盒子在减少压缩所用位数的同时确保了误差控制。我们在宇宙学、流体动力学及聚变等离子体数据集上,将所提方法与当前最先进的压缩器进行了全面评估。