Learning semantic-rich representations from raw unlabeled time series data is critical for downstream tasks such as classification and forecasting. Contrastive learning has recently shown its promising representation learning capability in the absence of expert annotations. However, existing contrastive approaches generally treat each instance independently, which leads to false negative pairs that share the same semantics. To tackle this problem, we propose MHCCL, a Masked Hierarchical Cluster-wise Contrastive Learning model, which exploits semantic information obtained from the hierarchical structure consisting of multiple latent partitions for multivariate time series. Motivated by the observation that fine-grained clustering preserves higher purity while coarse-grained one reflects higher-level semantics, we propose a novel downward masking strategy to filter out fake negatives and supplement positives by incorporating the multi-granularity information from the clustering hierarchy. In addition, a novel upward masking strategy is designed in MHCCL to remove outliers of clusters at each partition to refine prototypes, which helps speed up the hierarchical clustering process and improves the clustering quality. We conduct experimental evaluations on seven widely-used multivariate time series datasets. The results demonstrate the superiority of MHCCL over the state-of-the-art approaches for unsupervised time series representation learning.
翻译:从原始无标签时间序列数据中学习语义丰富的表示,对于分类和预测等下游任务至关重要。对比学习近年展示了其在无专家标注情况下的优异表示学习能力。然而,现有对比方法通常独立处理每个实例,导致共享相同语义的负样本对产生误判。针对该问题,本文提出MHCCL——一种掩码层次化聚类对比学习模型,该模型利用由多个潜在划分构成的层次结构所蕴含的语义信息处理多变量时间序列。基于细粒度聚类保持更高纯度而粗粒度聚类反映高层语义的观察,我们提出新颖的下行掩码策略,通过整合聚类层次中的多粒度信息过滤虚假负样本并补充正样本。此外,MHCCL设计了新颖的上行掩码策略,移除各划分层级中的聚类离群点以精炼原型,从而加速层次化聚类过程并提升聚类质量。我们在七个广泛使用的多变量时间序列数据集上进行实验评估,结果表明MHCCL在无监督时间序列表示学习方面显著优于现有最先进方法。