Exploring Autoencoder-based Error-bounded Compression for Scientific Data

Error-bounded lossy compression is becoming an indispensable technique for the success of today's scientific projects with vast volumes of data produced during simulations or instrument data acquisitions. Not only can it significantly reduce data size, but it also can control the compression errors based on user-specified error bounds. Autoencoder (AE) models have been widely used in image compression, but few AE-based compression approaches support error-bounding features, which are highly required by scientific applications. To address this issue, we explore using convolutional autoencoders to improve error-bounded lossy compression for scientific data, with the following three key contributions. (1) We provide an in-depth investigation of the characteristics of various autoencoder models and develop an error-bounded autoencoder-based framework in terms of the SZ model. (2) We optimize the compression quality for the main stages in our designed AE-based error-bounded compression framework, fine-tuning the block sizes and latent sizes and also optimizing the compression efficiency of latent vectors. (3) We evaluate our proposed solution using five real-world scientific datasets and compare them with six other related works. Experiments show that our solution exhibits a very competitive compression quality among all the compressors in our tests. In absolute terms, it can obtain a much better compression quality (100% ~ 800% improvement in compression ratio with the same data distortion) compared with SZ2.1 and ZFP in cases with a high compression ratio.

翻译：误差有界有损压缩正成为当今科学项目成功不可或缺的技术，这些项目在模拟或仪器数据采集过程中会产生海量数据。该技术不仅能显著缩减数据规模，还能根据用户指定的误差界限控制压缩误差。自编码器模型已广泛用于图像压缩领域，但鲜有基于自编码器的压缩方法支持科学应用所亟需的误差有界特性。为解决这一问题，我们探索利用卷积自编码器改进科学数据的误差有界有损压缩，主要贡献包含以下三点：(1) 深入研究了多种自编码器模型的特征，并基于SZ模型构建了误差有界自编码器框架；(2) 针对所设计的基于自编码器的误差有界压缩框架中的主要阶段优化压缩质量，微调分块尺寸与潜在空间维度，同时优化潜在向量的压缩效率；(3) 采用五个真实科学数据集评估所提方案，并与六项相关工作进行对比。实验表明，本方案在所有测试压缩器中展现出极具竞争力的压缩质量。在绝对指标上，与SZ2.1和ZFP相比，本方案可在高压缩比场景下获得更优的压缩质量（相同数据失真条件下压缩比提升100%~800%）。

相关内容

自编码器

关注 141

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日