We propose a new and generic approach for detecting multiple change-points in general dependent data, termed random interval distillation (RID). By collecting random intervals with sufficient strength of signals and reassembling them into a sequence of informative short intervals, our new approach captures the shifts in signal characteristics across diverse dependent data forms including locally stationary high-dimensional time series and dynamic networks with Markov formation. We further propose a range of secondary refinements tailored to various data types to enhance the localization precision. Notably, for univariate time series and low-rank autoregressive networks, our methods achieve the minimax optimality as their independent counterparts. For practical applications, we introduce a clustering-based and data-driven procedure to determine the optimal threshold for signal strength, which is adaptable to a wide array of dependent data scenarios utilizing the connection between RID and clustering. Additionally, our method has been extended to identify kinks and changes in signals characterized by piecewise polynomial trends. We examine the effectiveness and usefulness of our methodology via extensive simulation studies and a real data example, implementing it in the R-package rid.
翻译:我们提出了一种新颖且通用的方法,用于检测一般相依数据中的多个变点,称为随机区间蒸馏(RID)。通过收集具有足够信号强度的随机区间并将其重组为一系列信息丰富的短区间,我们的新方法能够捕捉不同相依数据形式中信号特征的变化,包括局部平稳高维时间序列和具有马尔可夫形成的动态网络。我们进一步提出了一系列针对各种数据类型的二次优化策略,以提升定位精度。值得注意的是,对于单变量时间序列和低秩自回归网络,我们的方法达到了与其独立数据对应方法相同的极小化最优性。在实际应用中,我们引入了一种基于聚类且数据驱动的流程来确定信号强度的最优阈值,该流程利用RID与聚类之间的关联,适用于多种相依数据场景。此外,我们的方法已扩展至识别由分段多项式趋势表征的信号中的拐点和变化。我们通过广泛的模拟研究和一个实际数据示例验证了该方法的效果和实用性,并将其实现于R包rid中。