We consider detecting change points in the correlation structure of streaming data with minimum assumptions posed on the underlying data distribution. Detection statistics are constructed for dense and sparse change settings, based on $\ell_1$ and $\ell_{\infty}$ norms of the squared difference of vectorized pre- and post-change correlation matrices, respectively. We also propose a novel threshold determination algorithm based on sign-flip permutations that enhances the efficiency of our procedure, particularly when the data dimension is large compared to the window size. Theoretical guarantees of the proposed methods are provided in terms of average run length in the no-change regime and expected detection delay in the post-change regime. We evaluate the performance of the proposed methods across a wide range of simulated datasets and demonstrate their effectiveness, with small detection delays that are comparable to the exact optimal CUSUM test. Finally, we demonstrate the effectiveness of our methods on real-world datasets, including El Ni{ñ}o event forecasting, where we achieve a state-of-the-art hit rate exceeding 0.86 with near-zero false alarms, as well as seismic event detection.
翻译:本文考虑在最小化假设条件下检测流数据相关性结构中的变点。针对密集变化与稀疏变化场景,分别基于向量化前后变相关性矩阵平方差的$\ell_1$范数与$\ell_{\infty}$范数构建检测统计量。我们同时提出一种基于符号翻转置换的新型阈值确定算法,该算法显著提升了检测流程的效率,尤其在数据维度远大于窗口尺寸时效果显著。所提方法在无变化状态下的平均游程长度与后变化状态下的期望检测延迟方面均具有理论保证。我们在多种模拟数据集上评估了所提方法的性能,结果表明其具有与精确最优CUSUM检验相当的微小检测延迟。最后,我们在真实世界数据集上验证了方法的有效性,包括厄尔尼诺事件预测(实现了超过0.86的命中率且虚警率趋近于零)以及地震事件检测。