Self-supervised contrastive learning has become a key technique in deep learning, particularly in time series analysis, due to its ability to learn meaningful representations without explicit supervision. Augmentation is a critical component in contrastive learning, where different augmentations can dramatically impact performance, sometimes influencing accuracy by over 30%. However, the selection of augmentations is predominantly empirical which can be suboptimal, or grid searching that is time-consuming. In this paper, we establish a principled framework for selecting augmentations based on dataset characteristics such as trend and seasonality. Specifically, we construct 12 synthetic datasets incorporating trend, seasonality, and integration weights. We then evaluate the effectiveness of 8 different augmentations across these synthetic datasets, thereby inducing generalizable associations between time series characteristics and augmentation efficiency. Additionally, we evaluated the induced associations across 6 real-world datasets encompassing domains such as activity recognition, disease diagnosis, traffic monitoring, electricity usage, mechanical fault prognosis, and finance. These real-world datasets are diverse, covering a range from 1 to 12 channels, 2 to 10 classes, sequence lengths of 14 to 1280, and data frequencies from 250 Hz to daily intervals. The experimental results show that our proposed trend-seasonality-based augmentation recommendation algorithm can accurately identify the effective augmentations for a given time series dataset, achieving an average Recall@3 of 0.667, outperforming baselines. Our work provides guidance for studies employing contrastive learning in time series analysis, with wide-ranging applications. All the code, datasets, and analysis results will be released at https://github.com/DL4mHealth/TS-Contrastive-Augmentation-Recommendation.
翻译:自监督对比学习已成为深度学习中的关键技术,尤其在时间序列分析领域,因其无需显式监督即可学习有意义的表示。数据增强是对比学习的关键组成部分,不同的增强方法可能显著影响性能,有时可使准确率变化超过30%。然而,增强方法的选择主要依赖经验(可能次优)或耗时巨大的网格搜索。本文建立了一个基于数据集特征(如趋势性和季节性)的数据增强选择原则性框架。具体而言,我们构建了12个包含趋势性、季节性和整合权重的合成数据集,并评估了8种不同增强方法在这些合成数据集上的有效性,从而推导出时间序列特征与增强效率之间的可泛化关联。此外,我们在6个真实世界数据集上验证了推导出的关联关系,这些数据集涵盖活动识别、疾病诊断、交通监测、用电量分析、机械故障预测和金融等多个领域。这些真实数据集具有多样性,涵盖1至12个通道、2至10个类别、14至1280的序列长度,以及250赫兹至日频的数据频率。实验结果表明,我们提出的基于趋势-季节性的增强推荐算法能够准确识别给定时间序列数据集的有效增强方法,平均Recall@3达到0.667,优于基线方法。本工作为时间序列分析中采用对比学习的研究提供了指导,具有广泛的应用前景。所有代码、数据集和分析结果将在https://github.com/DL4mHealth/TS-Contrastive-Augmentation-Recommendation 发布。