Error-bounded lossy compression is becoming an indispensable technique for the success of today's scientific projects with vast volumes of data produced during simulations or instrument data acquisitions. Not only can it significantly reduce data size, but it also can control the compression errors based on user-specified error bounds. Autoencoder (AE) models have been widely used in image compression, but few AE-based compression approaches support error-bounding features, which are highly required by scientific applications. To address this issue, we explore using convolutional autoencoders to improve error-bounded lossy compression for scientific data, with the following three key contributions. (1) We provide an in-depth investigation of the characteristics of various autoencoder models and develop an error-bounded autoencoder-based framework in terms of the SZ model. (2) We optimize the compression quality for the main stages in our designed AE-based error-bounded compression framework, fine-tuning the block sizes and latent sizes and also optimizing the compression efficiency of latent vectors. (3) We evaluate our proposed solution using five real-world scientific datasets and compare them with six other related works. Experiments show that our solution exhibits a very competitive compression quality among all the compressors in our tests. In absolute terms, it can obtain a much better compression quality (100% ~ 800% improvement in compression ratio with the same data distortion) compared with SZ2.1 and ZFP in cases with a high compression ratio.
翻译:误差有界有损压缩正成为当今科学项目成功不可或缺的技术,这些项目在模拟或仪器数据采集过程中会产生海量数据。该技术不仅能显著缩减数据规模,还能根据用户指定的误差界限控制压缩误差。自编码器模型已广泛用于图像压缩领域,但鲜有基于自编码器的压缩方法支持科学应用所亟需的误差有界特性。为解决这一问题,我们探索利用卷积自编码器改进科学数据的误差有界有损压缩,主要贡献包含以下三点:(1) 深入研究了多种自编码器模型的特征,并基于SZ模型构建了误差有界自编码器框架;(2) 针对所设计的基于自编码器的误差有界压缩框架中的主要阶段优化压缩质量,微调分块尺寸与潜在空间维度,同时优化潜在向量的压缩效率;(3) 采用五个真实科学数据集评估所提方案,并与六项相关工作进行对比。实验表明,本方案在所有测试压缩器中展现出极具竞争力的压缩质量。在绝对指标上,与SZ2.1和ZFP相比,本方案可在高压缩比场景下获得更优的压缩质量(相同数据失真条件下压缩比提升100%~800%)。