We study nonparametric change-point detection for high-dimensional data in regimes where inference must be performed from small batches of observations. Our primary focus is the high-dimensional, low sample size (HDLSS) regime, where the sequence length is fixed while the ambient dimension diverges. We propose a dimension-averaged angular kernel scan framework for detecting marginal distributional shifts. The statistic aggregates bounded one-dimensional angular discrepancies across coordinates, yielding a fully nonparametric, hyperparameter-free, and moment-agnostic estimator that remains well-defined without specifying, estimating, or assuming finite marginal moments; for example, under heavy-tailed or contaminated distributions. For the offline single-change problem, we derive an exact population mean factorization into a universal deterministic shape function and a scalar signal factor, and characterize the exact null covariance structure up to a scalar variance factor, both valid for any fixed sample size and dimension. We also establish an HDLSS multivariate central limit theorem under cross-coordinate strong mixing which leads to a variance-calibrated asymptotically distribution-free test, asymptotic type-I error control, and lower bounds on power and localization accuracy. We further extend the offline procedure to a fixed-window sequential monitoring procedure for high-dimensional streaming data, and obtain ARL calibration and worst-case Pollak EDD bounds. Simulation studies demonstrate that the proposed method can accurately detect and localize changes in many challenging HDLSS and streaming high-dimensional settings where moment-based or hyperparameter-sensitive procedures may be extremely unstable or inaccurate.
翻译:我们研究在高维数据中基于小批量观测进行推断时非参数变点检测问题。主要关注高维低样本量(HDLSS)场景,其中序列长度固定而环境维度发散。我们提出一种维度平均角核扫描框架,用于检测边际分布偏移。该统计量通过聚合各坐标上的有界一维角差异,得到完全非参数、无超参数且不依赖矩的估计量——该估计量无需指定、估计或假设有限边际矩即可良好定义,例如适用于重尾分布或污染分布。针对离线单变点问题,我们推导出精准总体均值分解为通用确定性形状函数与标量信号因子,并刻画精确零协方差结构(仅含标量方差因子),二者对任意固定样本量和维度均成立。我们还建立跨坐标强混合条件下的HDLSS多元中心极限定理,由此得到方差校准的渐近分布无关检验、渐近第一类错误控制,以及检验功效与定位精度的下界。进一步将离线流程扩展为面向高维流式数据的固定窗口序贯监测流程,并获得ARL校准与最坏情况Pollak EDD界。模拟研究表明,在矩估计或超参数敏感方法可能极不稳定或不准确且具有挑战性的HDLSS与高维流式场景中,所提方法能精确检测并定位变化。