We propose Deep Dict, a deep learning-based lossy time series compressor designed to achieve a high compression ratio while maintaining decompression error within a predefined range. Deep Dict incorporates two essential components: the Bernoulli transformer autoencoder (BTAE) and a distortion constraint. BTAE extracts Bernoulli representations from time series data, reducing the size of the representations compared to conventional autoencoders. The distortion constraint limits the prediction error of BTAE to the desired range. Moreover, in order to address the limitations of common regression losses such as L1/L2, we introduce a novel loss function called quantized entropy loss (QEL). QEL takes into account the specific characteristics of the problem, enhancing robustness to outliers and alleviating optimization challenges. Our evaluation of Deep Dict across ten diverse time series datasets from various domains reveals that Deep Dict outperforms state-of-the-art lossy compressors in terms of compression ratio by a significant margin by up to 53.66%.
翻译:我们提出Deep Dict,一种基于深度学习的损段时间序列压缩器,旨在实现高压缩比的同时将解压误差控制在预设范围内。Deep Dict包含两个核心组件:伯努利变换自编码器(BTAE)和失真约束。BTAE从时间序列数据中提取伯努利表示,相较于传统自编码器,其表示尺寸更小。失真约束将BTAE的预测误差限制在期望范围内。此外,为解决L1/L2等常用回归损失的局限性,我们引入一种新型损失函数——量化熵损失(QEL)。QEL考虑问题的特定特征,增强了对异常值的鲁棒性,并缓解了优化挑战。我们在来自不同领域的十个多样化时间序列数据集上对Deep Dict进行评估,结果显示,Deep Dict在压缩比方面显著优于现有最先进的损段时间压缩器,最高提升幅度达53.66%。