The DTW Barycenter Averaging (DBA) algorithm is a widely used algorithm for estimating the mean of a given set of point sequences. In this context, the mean is defined as a point sequence that minimises the sum of dynamic time warping distances (DTW). The algorithm is similar to the $k$-means algorithm in the sense that it alternately repeats two steps: (1) computing an optimal assignment to the points of the current mean, and (2) computing an optimal mean under the current assignment. The popularity of DBA can be attributed to the fact that it works well in practice, despite any theoretical guarantees to be known. In our paper, we aim to initiate a theoretical study of the number of iterations that DBA performs until convergence. We assume the algorithm is given $n$ sequences of $m$ points in $\mathbb{R}^d$ and a parameter $k$ that specifies the length of the mean sequence to be computed. We show that, in contrast to its fast running time in practice, the number of iterations can be exponential in $k$ in the worst case - even if the number of input sequences is $n=2$. We complement these findings with experiments on real-world data that suggest this worst-case behaviour is likely degenerate. To better understand the performance of the algorithm on non-degenerate input, we study DBA in the model of smoothed analysis, upper-bounding the expected number of iterations in the worst case under random perturbations of the input. Our smoothed upper bound is polynomial in $k$, $n$ and $d$, and for constant $n$, it is also polynomial in $m$. For our analysis, we adapt the set of techniques that were developed for analysing $k$-means and observe that this set of techniques is not sufficient to obtain tight bounds for general $n$.
翻译:DTW质心平均(DBA)算法是一种广泛用于估计给定点序列集均值的算法。在此上下文中,均值被定义为最小化动态时间规整(DTW)距离之和的点序列。该算法与$k$-均值算法类似,交替重复两个步骤:(1)计算当前均值点的最优分配,以及(2)在当前分配下计算最优均值。DBA的流行源于其在实际应用中的良好表现,尽管目前尚缺乏理论保证。本文旨在从理论上研究DBA算法收敛所需的迭代次数。假设算法输入为$n$个$\mathbb{R}^d$空间中的$m$点序列,以及一个参数$k$,用于指定待计算均值序列的长度。我们证明,与其实践中的快速运行时间相反,在最坏情况下,迭代次数可能随$k$呈指数增长——即便输入序列数量仅为$n=2$。我们通过真实数据实验补充了这些结论,表明最坏情况可能具有退化性质。为更深入理解算法在非退化输入上的性能,我们在平滑分析模型中研究DBA,对输入进行随机扰动后,在最坏情况下给出期望迭代次数的上界。该平滑上界关于$k$、$n$和$d$呈多项式形式,且当$n$为常数时,关于$m$亦为多项式。为完成分析,我们改编了用于分析$k$-均值的技巧集合,并发现该技巧集合不足以对一般$n$获得紧致界。