This paper presents error-bounded lossy compression tailored for particle datasets from diverse scientific applications in cosmology, fluid dynamics, and fusion energy sciences. As today's high-performance computing capabilities advance, these datasets often reach trillions of points, posing significant visualization, analysis, and storage challenges. While error-bounded lossy compression makes it possible to represent floating-point values with strict pointwise accuracy guarantees, the lack of correlations in particle data's storage ordering often limits the compression ratio. Inspired by quantization-encoding schemes in SZ lossy compressors, we dynamically determine the number of bits to encode particles of the dataset to increase the compression ratio. Specifically, we utilize a k-d tree to partition particles into subregions and generate ``bit boxes'' centered at particles for each subregion to encode their positions. These bit boxes ensure error control while reducing the bit count used for compression. We comprehensively evaluate our method against state-of-the-art compressors on cosmology, fluid dynamics, and fusion plasma datasets.
翻译:本文提出了一种针对宇宙学、流体动力学和聚变能源科学等不同科学应用中粒子数据集的误差有界有损压缩方法。随着当今高性能计算能力的提升,这些数据集通常达到数万亿个点,给可视化、分析和存储带来了重大挑战。虽然误差有界有损压缩能够以严格的逐点精度保证来表示浮点值,但粒子数据存储排序中缺乏相关性往往限制了压缩比。受SZ有损压缩器中量化编码方案的启发,我们动态确定编码数据集中粒子所需的位数以提高压缩比。具体而言,我们利用k-d树将粒子划分为子区域,并为每个子区域生成以粒子为中心的"位盒"来编码其位置。这些位盒在减少压缩所用位数的同时确保了误差控制。我们在宇宙学、流体动力学和聚变等离子体数据集上,将我们的方法与最先进的压缩器进行了全面评估。