Determining the number of change-points is a first-step and fundamental task in change-point detection problems, as it lays the groundwork for subsequent change-point position estimation. While the existing literature offers various methods for consistently estimating the number of change-points, these methods typically yield a single point estimate without any assurance that it recovers the true number of changes in a specific dataset. Moreover, achieving consistency often hinges on very stringent conditions that can be challenging to verify in practice. To address these issues, we introduce a unified test-inverse procedure to construct a confidence set for the number of change-points. The proposed confidence set provides a set of possible values within which the true number of change-points is guaranteed to lie with a specified level of confidence. We further proved that the confidence set is sufficiently narrow to be powerful and informative by deriving the order of its cardinality. Remarkably, this confidence set can be established under more relaxed conditions than those required by most point estimation techniques. We also advocate multiple-splitting procedures to enhance stability and extend the proposed method to heavy-tailed and dependent settings. As a byproduct, we may also leverage this constructed confidence set to assess the effectiveness of point-estimation algorithms. Through extensive simulation studies, we demonstrate the superior performance of our confidence set approach. Additionally, we apply this method to analyze a bladder tumor microarray dataset. Supplementary Material, including proofs of all theoretical results, computer code, the R package, and extended simulation studies, are available online.
翻译:确定变点数量是变点检测问题中的首要基础任务,它为后续的变点位置估计奠定基础。尽管现有文献提供了多种一致估计变点数量的方法,但这些方法通常仅产生单一的点估计值,无法保证在特定数据集中恢复真实的变点数量。此外,实现一致性往往依赖于极为严苛的条件,在实践中难以验证。为解决这些问题,我们提出了一种统一的检验逆过程来构建变点数量的置信集。所提出的置信集提供了一组可能的取值,保证真实变点数量以指定置信水平落入其中。我们进一步通过推导置信集基数的阶数,证明该置信集足够狭窄以具备有效性和信息量。值得注意的是,该置信集可在比大多数点估计技术所需条件更宽松的假设下建立。我们还提倡采用多重分割程序以增强稳定性,并将所提方法推广至厚尾和相依数据场景。作为副产品,我们还可利用构建的置信集评估点估计算法的有效性。通过大量仿真研究,我们展示了置信集方法的优越性能。此外,我们将该方法应用于膀胱肿瘤微阵列数据集的分析。补充材料(包括所有理论结果的证明、计算机代码、R语言软件包及扩展仿真研究)可在网上获取。