Lossy compression has become an important technique to reduce data size in many domains. This type of compression is especially valuable for large-scale scientific data, whose size ranges up to several petabytes. Although Autoencoder-based models have been successfully leveraged to compress images and videos, such neural networks have not widely gained attention in the scientific data domain. Our work presents a neural network that not only significantly compresses large-scale scientific data but also maintains high reconstruction quality. The proposed model is tested with scientific benchmark data available publicly and applied to a large-scale high-resolution climate modeling data set. Our model achieves a compression ratio of 140 on several benchmark data sets without compromising the reconstruction quality. Simulation data from the High-Resolution Community Earth System Model (CESM) Version 1.3 over 500 years are also being compressed with a compression ratio of 200 while the reconstruction error is negligible for scientific analysis.
翻译:有损压缩已成为众多领域中缩减数据规模的重要技术,尤其对于规模可达数拍字节的大规模科学数据而言,其价值尤为突出。尽管基于自编码器的模型已成功用于图像和视频压缩,但此类神经网络在科学数据领域尚未获得广泛关注。本研究提出一种神经网络,不仅能够显著压缩大规模科学数据,还能保持较高的重建质量。该模型使用公开可用的科学基准数据集进行测试,并应用于大规模高分辨率气候模拟数据集。在不影响重建质量的前提下,我们的模型在多个基准数据集上实现了140倍的压缩比。针对高分辨率社区地球系统模型(CESM)1.3版本长达500年的模拟数据,该模型同样实现了200倍的压缩比,且重建误差对科学分析而言可忽略不计。