Data augmentation can significantly enhance the performance of machine learning tasks by addressing data scarcity and improving generalization. However, generating time series data presents unique challenges. A model must not only learn a probability distribution that reflects the real data distribution but also capture the conditional distribution at each time step to preserve the inherent temporal dependencies. To address these challenges, we introduce AVATAR, a framework that combines Adversarial Autoencoders (AAE) with Autoregressive Learning to achieve both objectives. Specifically, our technique integrates the autoencoder with a supervisor and introduces a novel supervised loss to assist the decoder in learning the temporal dynamics of time series data. Additionally, we propose another innovative loss function, termed distribution loss, to guide the encoder in more efficiently aligning the aggregated posterior of the autoencoder's latent representation with a prior Gaussian distribution. Furthermore, our framework employs a joint training mechanism to simultaneously train all networks using a combined loss, thereby fulfilling the dual objectives of time series generation. We evaluate our technique across a variety of time series datasets with diverse characteristics. Our experiments demonstrate significant improvements in both the quality and practical utility of the generated data, as assessed by various qualitative and quantitative metrics.
翻译:数据增强通过解决数据稀缺性和提升泛化能力,能够显著提高机器学习任务的性能。然而,时间序列数据的生成面临独特的挑战。模型不仅需要学习反映真实数据分布的概率分布,还必须捕获每个时间步的条件分布,以保持其固有的时序依赖性。为应对这些挑战,我们提出了AVATAR框架,该框架将对抗自编码器与自回归学习相结合,以同时实现这两个目标。具体而言,我们的技术将自编码器与一个监督器相集成,并引入一种新颖的监督损失,以辅助解码器学习时间序列数据的时序动态。此外,我们提出了另一种创新的损失函数,称为分布损失,用以引导编码器更有效地将自编码器潜在表示的聚合后验与先验高斯分布对齐。进一步地,我们的框架采用联合训练机制,通过组合损失同时训练所有网络,从而满足时间序列生成的双重目标。我们在多种具有不同特性的时间序列数据集上评估了所提出的技术。实验结果表明,根据多种定性与定量指标的评估,生成数据的质量与实际效用均得到了显著提升。