Labeling time series data is an expensive task because of domain expertise and dynamic nature of the data. Hence, we often have to deal with limited labeled data settings. Data augmentation techniques have been successfully deployed in domains like computer vision to exploit the use of existing labeled data. We adapt one of the most commonly used technique called MixUp, in the time series domain. Our proposed, MixUp++ and LatentMixUp++, use simple modifications to perform interpolation in raw time series and classification model's latent space, respectively. We also extend these methods with semi-supervised learning to exploit unlabeled data. We observe significant improvements of 1\% - 15\% on time series classification on two public datasets, for both low labeled data as well as high labeled data regimes, with LatentMixUp++.
翻译:标注时间序列数据是一项昂贵任务,因其需要领域专业知识且数据具有动态特性。因此,我们常需应对标注数据有限的情况。数据增强技术已在计算机视觉等领域成功应用,以充分利用现有标注数据。我们将最常用的技术之一——MixUp,适配至时间序列领域。我们提出的MixUp++和LatentMixUp++,分别通过简单修改在原始时间序列和分类模型潜在空间中进行插值操作。我们还通过半监督学习扩展这些方法以利用未标注数据。实验表明,在低标注数据和高标注数据场景下,使用LatentMixUp++在两个公开数据集的时间序列分类任务中均实现了1%至15%的显著性能提升。