A novel methodology is proposed for clustering multivariate time series data using energy distance defined in Sz\'ekely and Rizzo (2013). Specifically, a dissimilarity matrix is formed using the energy distance statistic to measure separation between the finite dimensional distributions for the component time series. Once the pairwise dissimilarity matrix is calculated, a hierarchical clustering method is then applied to obtain the dendrogram. This procedure is completely nonparametric as the dissimilarities between stationary distributions are directly calculated without making any model assumptions. In order to justify this procedure, asymptotic properties of the energy distance estimates are derived for general stationary and ergodic time series. The method is illustrated in a simulation study for various component time series that are either linear or nonlinear. Finally the methodology is applied to two examples; one involves GDP of selected countries and the other is population size of various states in the U.S.A. in the years 1900 -1999.
翻译:本文提出了一种基于Székely和Rizzo (2013)定义的能量距离对多变量时间序列数据进行聚类的新方法。具体而言,利用能量距离统计量构建相异度矩阵,以衡量各分量时间序列之间有限维分布的分离程度。在计算成对相异度矩阵后,采用层次聚类方法获得树状图。由于无需任何模型假设即可直接计算平稳分布之间的相异度,该过程完全非参数化。为论证该方法的合理性,推导了一般平稳遍历时间序列下能量距离估计量的渐近性质。通过模拟研究对线性和非线性分量时间序列进行了方法验证。最后将该方法应用于两个实例:其一涉及选定国家的GDP数据,其二为美国各州1900-1999年间的人口规模数据。