For robust and efficient detection of change points, we introduce a novel methodology MUSCLE (Multiscale qUantile Segmentation Controlling Local Error) that partitions serial data into multiple segments, each sharing a common quantile. It leverages multiple tests for quantile changes over different scales and locations, and variational estimation. Unlike the often adopted global error control, MUSCLE focuses on local errors defined on individual segments, significantly improving detection power in finding change points. Meanwhile, due to the built-in model complexity penalty, it enjoys the finite sample guarantee that its false discovery rate (or the expected proportion of falsely detected change points) is upper bounded by its unique tuning parameter. Further, we obtain the consistency and the localization error rates in estimating change points, under mild signal-to-noise-ratio conditions. Both match (up to log factors) the minimax optimality results in the Gaussian setup. All theories hold under the only distributional assumption of serial independence. Incorporating the wavelet tree data structure, we develop an efficient dynamic programming algorithm for computing MUSCLE. Extensive simulations as well as real data applications in electrophysiology and geophysics demonstrate its competitiveness and effectiveness. An implementation via R package muscle is available on GitHub.
翻译:为实现稳健高效的变点检测,本文提出一种新颖方法MUSCLE(多尺度分位数分割控制局部误差),该方法将序列数据划分为多个具有共同分位数的区段。该方法融合了多尺度多位置的量化变化检验与变分估计技术。与常用的全局误差控制不同,MUSCLE聚焦于定义在单个区段上的局部误差,显著提升了变点检测的统计功效。同时,得益于内建的模型复杂度惩罚机制,该方法具有有限样本保证:其错误发现率(即误检变点的期望比例)严格受其唯一调谐参数的上界约束。此外,在温和的信噪比条件下,我们获得了变点估计的一致性结果与定位误差率,这两者在高斯设定下均达到(至多对数因子级别的)极小极大最优性。所有理论结果仅需序列独立性这一分布假设即可成立。通过结合小波树数据结构,我们开发了计算MUSCLE的高效动态规划算法。在电生理学与地球物理学领域的广泛仿真实验与真实数据应用均验证了其竞争力与有效性。该方法的R语言实现包muscle已在GitHub平台发布。