Analyzing sequential data is crucial in many domains, particularly due to the abundance of data collected from the Internet of Things paradigm. Time series classification, the task of categorizing sequential data, has gained prominence, with machine learning approaches demonstrating remarkable performance on public benchmark datasets. However, progress has primarily been in designing architectures for learning representations from raw data at fixed (or ideal) time scales, which can fail to generalize to longer sequences. This work introduces a \textit{compositional representation learning} approach trained on statistically coherent components extracted from sequential data. Based on a multi-scale change space, an unsupervised approach is proposed to segment the sequential data into chunks with similar statistical properties. A sequence-based encoder model is trained in a multi-task setting to learn compositional representations from these temporal components for time series classification. We demonstrate its effectiveness through extensive experiments on publicly available time series classification benchmarks. Evaluating the coherence of segmented components shows its competitive performance on the unsupervised segmentation task.
翻译:在许多领域中,分析序列数据至关重要,这尤其得益于物联网范式下收集的大量数据。时间序列分类作为对序列数据进行分类的任务,已变得日益重要,机器学习方法在公开基准数据集上展现出卓越性能。然而,进展主要集中于设计从固定(或理想)时间尺度的原始数据中学习表示的架构,这些架构可能难以泛化到更长的序列。本文提出一种在从序列数据提取的统计相干成分上进行训练的**组合表示学习**方法。基于多尺度变化空间,我们提出一种无监督方法将序列数据分割为具有相似统计特性的片段。在多项任务设置下,训练一个基于序列的编码器模型,以从这些时间成分中学习用于时间序列分类的组合表示。通过在公开可用的时间序列分类基准上进行大量实验,我们证明了该方法的有效性。对分割成分相干性的评估表明,该方法在无监督分割任务上具有竞争性性能。