Error-bounded lossy compression has been regarded as a promising way to address the ever-increasing amount of scientific data in today's high-performance computing systems. Pre-quantization, a critical technique to remove sequential dependency and enable high parallelism, is widely used to design and develop high-throughput error-controlled data compressors. Despite the extremely high throughput of pre-quantization based compressors, they generally suffer from low data quality with medium or large user-specified error bounds. In this paper, we investigate the artifacts generated by pre-quantization based compressors and propose a novel algorithm to mitigate them. Our contributions are fourfold: (1) We carefully characterize the artifacts in pre-quantization based compressors to understand the correlation between the quantization index and compression error; (2) We propose a novel quantization-aware interpolation algorithm to improve the decompressed data; (3) We parallelize our algorithm in both shared-memory and distributed-memory environments to obtain high performance; (4) We evaluate our algorithm and validate it with two leading pre-quantization based compressors using five real-world datasets. Experiments demonstrate that our artifact mitigation algorithm can effectively improve the quality of decompressed data produced by pre-quantization based compressors while maintaining their high compression throughput.
翻译:误差有界有损压缩被认为是应对当今高性能计算系统中日益增长的科学数据量的一种有前景的方法。预量化作为一种消除序列依赖、实现高并行性的关键技术,被广泛用于设计和开发高吞吐量的误差可控数据压缩器。尽管基于预量化的压缩器具有极高的吞吐量,但在用户指定中等或较大误差边界时,它们通常存在数据质量较低的问题。本文研究了基于预量化的压缩器产生的伪影,并提出一种新颖算法来缓解这些伪影。我们的贡献包括四个方面:(1)我们细致刻画了基于预量化的压缩器中的伪影,以理解量化索引与压缩误差之间的相关性;(2)我们提出了一种新颖的量化感知插值算法来改进解压后的数据;(3)我们在共享内存和分布式内存环境中并行化我们的算法以获得高性能;(4)我们使用五个真实世界数据集评估了我们的算法,并在两种领先的基于预量化的压缩器上进行了验证。实验表明,我们的伪影缓解算法能够有效提升基于预量化的压缩器所生成解压数据的质量,同时保持其高压缩吞吐量。