This paper considers the use of recently proposed optimal transport-based multivariate test statistics, namely rank energy and its variant the soft rank energy derived from entropically regularized optimal transport, for the unsupervised nonparametric change point detection (CPD) problem. We show that the soft rank energy enjoys both fast rates of statistical convergence and robust continuity properties which lead to strong performance on real datasets. Our theoretical analyses remove the need for resampling and out-of-sample extensions previously required to obtain such rates. In contrast the rank energy suffers from the curse of dimensionality in statistical estimation and moreover can signal a change point from arbitrarily small perturbations, which leads to a high rate of false alarms in CPD. Additionally, under mild regularity conditions, we quantify the discrepancy between soft rank energy and rank energy in terms of the regularization parameter. Finally, we show our approach performs favorably in numerical experiments compared to several other optimal transport-based methods as well as maximum mean discrepancy.
翻译:本文考虑使用最近提出的基于最优传输的多元检验统计量——即秩能量及其变体(由熵正则化最优传输导出的软秩能量)——来解决无监督非参数变点检测问题。我们证明软秩能量兼具统计收敛的快速率与强连续性质,这使其在实际数据集上表现出色。我们的理论分析消除了先前为获得此类收敛率而需要重抽样与样本外扩展的步骤。相比之下,秩能量在统计估计中受维度灾难影响,且任意微小扰动即可能触发变点信号,这导致变点检测中产生高误报率。此外,在温和正则条件下,我们量化了软秩能量与秩能量之间关于正则化参数的偏差。最后,数值实验表明,我们的方法在性能上优于其他几种基于最优传输的方法以及最大均值差异法。