Data objects taking value in a general metric space have become increasingly common in modern data analysis. In this paper, we study two important statistical inference problems, namely, two-sample testing and change-point detection, for such non-Euclidean data under temporal dependence. Typical examples of non-Euclidean valued time series include yearly mortality distributions, time-varying networks, and covariance matrix time series. To accommodate unknown temporal dependence, we advance the self-normalization (SN) technique (Shao, 2010) to the inference of non-Euclidean time series, which is substantially different from the existing SN-based inference for functional time series that reside in Hilbert space (Zhang et al., 2011). Theoretically, we propose new regularity conditions that could be easier to check than those in the recent literature, and derive the limiting distributions of the proposed test statistics under both null and local alternatives. For change-point detection problem, we also derive the consistency for the change-point location estimator, and combine our proposed change-point test with wild binary segmentation to perform multiple change-point estimation. Numerical simulations demonstrate the effectiveness and robustness of our proposed tests compared with existing methods in the literature. Finally, we apply our tests to two-sample inference in mortality data and change-point detection in cryptocurrency data.
翻译:在现代数据分析中,取值于一般度量空间的数据对象日益常见。本文针对此类在时间依赖条件下的非欧几里得数据,研究两个重要的统计推断问题:双样本检验与变点检测。非欧几里得值时间序列的典型示例包括年度死亡率分布、时变网络以及协方差矩阵时间序列。为处理未知的时间依赖性,我们将自标准化(SN)技术(Shao, 2010)推广至非欧几里得时间序列的推断,这与现有基于SN的希尔伯特空间函数型时间序列推断(Zhang等, 2011)存在本质差异。理论上,我们提出较近期文献更易验证的新型正则条件,并推导出原假设与局部备择假设下检验统计量的极限分布。针对变点检测问题,我们还推导了变点位置估计量的一致性,并将所提出的变点检验与野外观测二元分割方法相结合,实现多变点估计。数值模拟表明,与现有方法相比,所提检验具有有效性和鲁棒性。最后,我们将检验方法应用于死亡率数据的双样本推断与加密货币数据的变点检测。