Semi-Supervised End-To-End Contrastive Learning For Time Series Classification

Time series classification is a critical task in various domains, such as finance, healthcare, and sensor data analysis. Unsupervised contrastive learning has garnered significant interest in learning effective representations from time series data with limited labels. The prevalent approach in existing contrastive learning methods consists of two separate stages: pre-training the encoder on unlabeled datasets and fine-tuning the well-trained model on a small-scale labeled dataset. However, such two-stage approaches suffer from several shortcomings, such as the inability of unsupervised pre-training contrastive loss to directly affect downstream fine-tuning classifiers, and the lack of exploiting the classification loss which is guided by valuable ground truth. In this paper, we propose an end-to-end model called SLOTS (Semi-supervised Learning fOr Time clasSification). SLOTS receives semi-labeled datasets, comprising a large number of unlabeled samples and a small proportion of labeled samples, and maps them to an embedding space through an encoder. We calculate not only the unsupervised contrastive loss but also measure the supervised contrastive loss on the samples with ground truth. The learned embeddings are fed into a classifier, and the classification loss is calculated using the available true labels. The unsupervised, supervised contrastive losses and classification loss are jointly used to optimize the encoder and classifier. We evaluate SLOTS by comparing it with ten state-of-the-art methods across five datasets. The results demonstrate that SLOTS is a simple yet effective framework. When compared to the two-stage framework, our end-to-end SLOTS utilizes the same input data, consumes a similar computational cost, but delivers significantly improved performance. We release code and datasets at https://anonymous.4open.science/r/SLOTS-242E.

翻译：时间序列分类是金融、医疗和传感器数据分析等多个领域中的关键任务。无监督对比学习在利用有限标签从时间序列数据中学习有效表征方面引起了广泛关注。现有对比学习方法的主流范式包含两个独立阶段：先在无标签数据集上预训练编码器，再在小规模标注数据集上微调已训练的模型。然而，这种两阶段方法存在若干缺陷，例如无监督预训练对比损失无法直接影响下游微调分类器，且未能利用由宝贵真实标签指导的分类损失。本文提出一种名为SLOTS（面向时间分类的半监督学习）的端到端模型。SLOTS接收包含大量无标签样本和少量标注样本的半标注数据集，通过编码器将其映射至嵌入空间。我们不仅计算无监督对比损失，还对带有真实标签的样本测量有监督对比损失。学习得到的嵌入被输入分类器，并利用可用的真实标签计算分类损失。无监督、有监督对比损失与分类损失被联合用于优化编码器和分类器。我们在五个数据集上将SLOTS与十种最先进方法进行对比评估。结果表明SLOTS是一个简单而有效的框架。与两阶段框架相比，我们端到端的SLOTS使用相同输入数据、消耗相似计算成本，但性能显著提升。我们已在https://anonymous.4open.science/r/SLOTS-242E 发布代码和数据集。