Modern time series analysis requires the ability to handle datasets that are inherently high-dimensional; examples include applications in climatology, where measurements from numerous sensors must be taken into account, or inventory tracking of large shops, where the dimension is defined by the number of tracked items. The standard way to mitigate computational issues arising from the high dimensionality of the data is by applying some dimension reduction technique that preserves the structural properties of the ambient space. The dissimilarity between two time series is often measured by ``discrete'' notions of distance, e.g. the dynamic time warping or the discrete Fr\'echet distance. Since all these distance functions are computed directly on the points of a time series, they are sensitive to different sampling rates or gaps. The continuous Fr\'echet distance offers a popular alternative which aims to alleviate this by taking into account all points on the polygonal curve obtained by linearly interpolating between any two consecutive points in a sequence. We study the ability of random projections \`a la Johnson and Lindenstrauss to preserve the continuous Fr\'echet distance of polygonal curves by effectively reducing the dimension. In particular, we show that one can reduce the dimension to $O(\epsilon^{-2} \log N)$, where $N$ is the total number of input points while preserving the continuous Fr\'echet distance between any two determined polygonal curves within a factor of $1\pm \epsilon$. We conclude with applications on clustering.
翻译:现代时间序列分析需要具备处理本质高维数据集的能力;例如气候学中需考虑众多传感器的测量数据,或大型商店库存追踪中由被追踪商品数量定义的维度。缓解高维数据引发的计算问题的标准方法是采用能保持空间结构特性的降维技术。两个时间序列间的差异常通过"离散"距离概念度量,如动态时间规整或离散弗雷歇距离。由于这些距离函数直接基于时间序列点计算,因此对不同的采样率或数据缺失敏感。连续弗雷歇距离作为一种流行替代方案,通过考虑序列中连续两点间线性插值所得折线的所有点来缓解此问题。本文研究Johnson-Lindenstrauss型随机投影保持折线连续弗雷歇距离的能力,从而实现有效降维。特别地,我们证明可将维度降至$O(\epsilon^{-2}\log N)$($N$为输入总点数),同时将任意两条确定折线间的连续弗雷歇距离保持在$1\pm\epsilon$因子内。最后给出聚类应用实例。