StiefelGen: A Simple, Model Agnostic Approach for Time Series Data Augmentation over Riemannian Manifolds

Data augmentation is an area of research which has seen active development in many machine learning fields, such as in image-based learning models, reinforcement learning for self driving vehicles, and general noise injection for point cloud data. However, convincing methods for general time series data augmentation still leaves much to be desired, especially since the methods developed for these models do not readily cross-over. Three common approaches for time series data augmentation include: (i) Constructing a physics-based model and then imbuing uncertainty over the coefficient space (for example), (ii) Adding noise to the observed data set(s), and, (iii) Having access to ample amounts of time series data sets from which a robust generative neural network model can be trained. However, for many practical problems that work with time series data in the industry: (i) One usually does not have access to a robust physical model, (ii) The addition of noise can in of itself require large or difficult assumptions (for example, what probability distribution should be used? Or, how large should the noise variance be?), and, (iii) In practice, it can be difficult to source a large representative time series data base with which to train the neural network model for the underlying problem. In this paper, we propose a methodology which attempts to simultaneously tackle all three of these previous limitations to a large extent. The method relies upon the well-studied matrix differential geometry of the Stiefel manifold, as it proposes a simple way in which time series signals can placed on, and then smoothly perturbed over the manifold. We attempt to clarify how this method works by showcasing several potential use cases which in particular work to take advantage of the unique properties of this underlying manifold.

翻译：数据增强是诸多机器学习领域中研究活跃的方向，例如基于图像的学习模型、自动驾驶强化学习以及点云数据的通用噪声注入。然而，针对通用时间序列数据增强的可信方法仍存在诸多不足，尤其因为为上述模型开发的方法难以直接迁移应用。时间序列数据增强的三种常见方法包括：（i）构建基于物理的模型，随后在系数空间（例如）中注入不确定性；（ii）向观测数据集添加噪声；（iii）利用充足的时间序列数据集训练鲁棒的生成式神经网络模型。然而，针对工业应用中诸多处理时间序列数据的实际问题：（i）研究者通常无法获取可靠的物理模型；（ii）噪声添加本身可能需要强假设或复杂假设（例如，应使用何种概率分布？噪声方差应设多大？）；（iii）实践中难以收集到足够大且具有代表性的时间序列数据库来训练针对具体问题的神经网络模型。本文提出一种方法，试图在很大程度上同时解决上述三项局限。该方法依赖于经过充分研究的施蒂费尔流形矩阵微分几何，提出一种将时间序列信号置于该流形上并对其光滑扰动的简易方式。我们通过展示若干潜在应用案例来阐明该方法的工作原理，这些案例特别利用了该流形独特的性质。