Time series data can be found in almost every domain, ranging from the medical field to manufacturing and wireless communication. Generating realistic and useful exemplars and prototypes is a fundamental data analysis task. In this paper, we investigate a novel approach to generating realistic and useful exemplars and prototypes for time series data. Our approach uses a new form of time series average, the ShapeDTW Barycentric Average. We therefore turn our attention to accurately generating time series prototypes with a novel approach. The existing time series prototyping approaches rely on the Dynamic Time Warping (DTW) similarity measure such as DTW Barycentering Average (DBA) and SoftDBA. These last approaches suffer from a common problem of generating out-of-distribution artifacts in their prototypes. This is mostly caused by the DTW variant used and its incapability of detecting neighborhood similarities, instead it detects absolute similarities. Our proposed method, ShapeDBA, uses the ShapeDTW variant of DTW, that overcomes this issue. We chose time series clustering, a popular form of time series analysis to evaluate the outcome of ShapeDBA compared to the other prototyping approaches. Coupled with the k-means clustering algorithm, and evaluated on a total of 123 datasets from the UCR archive, our proposed averaging approach is able to achieve new state-of-the-art results in terms of Adjusted Rand Index.
翻译:时间序列数据几乎存在于各个领域,从医疗领域到制造业和无线通信。生成真实且有效的样本与原型是基础数据分析任务之一。本文研究了一种基于时间序列新型平均方法——ShapeDTW重心平均,来生成真实有效的时间序列样本与原型的新方法。我们转而关注通过新方法精准生成时间序列原型。现有时间序列原型生成方法依赖动态时间规整(DTW)相似性度量,例如DTW重心平均(DBA)和SoftDBA。这些方法普遍存在原型中生成分布外伪影的问题,这主要源于所采用的DTW变体无法检测邻域相似性,而仅能检测绝对相似性。我们提出的ShapeDBA方法使用DTW的ShapeDTW变体克服了这一缺陷。选择时间序列聚类这一主流分析形式,将ShapeDBA与其他原型生成方法的聚类效果进行对比评估。结合k-means聚类算法,在UCR数据库的123个数据集上进行的评估表明,我们的平均方法在调整兰德指数上取得了最新的最优结果。