We study a mean change point testing problem for high-dimensional data, with exponentially- or polynomially-decaying tails. In each case, depending on the $\ell_0$-norm of the mean change vector, we separately consider dense and sparse regimes. We characterise the boundary between the dense and sparse regimes under the above two tail conditions for the first time in the change point literature and propose novel testing procedures that attain optimal rates in each of the four regimes up to a poly-iterated logarithmic factor. By comparing with previous results under Gaussian assumptions, our results quantify the costs of heavy-tailedness on the fundamental difficulty of change point testing problems for high-dimensional data. To be specific, when the error vectors follow sub-Weibull distributions, a CUSUM-type statistic is shown to achieve a minimax testing rate up to $\sqrt{\log\log(8n)}$. When the error distributions have polynomially-decaying tails, admitting bounded $\alpha$-th moments for some $\alpha \geq 4$, we introduce a median-of-means-type test statistic that achieves a near-optimal testing rate in both dense and sparse regimes. In particular, in the sparse regime, we further propose a computationally-efficient test to achieve the exact optimality. Surprisingly, our investigation in the even more challenging case of $2 \leq \alpha < 4$, unveils a new phenomenon that the minimax testing rate has no sparse regime, i.e.\ testing sparse changes is information-theoretically as hard as testing dense changes. This phenomenon implies a phase transition of the minimax testing rates at $\alpha = 4$.
翻译:我们研究了高维数据的均值变点检验问题,其中数据尾部服从指数衰减或多项式衰减。根据均值变化向量的$\ell_0$范数,我们分别考虑稠密和稀疏两种情形。在变点文献中,我们首次在上述两种尾部条件下刻画了稠密与稀疏情形之间的边界,并提出了四种情形下均能达到最优检验速率(至多相差一个多重迭代对数因子)的新检验方法。通过与高斯假设下的先前结果对比,我们的研究量化了重尾性对高维数据变点检验问题基本难度的代价。具体而言,当误差向量服从子威布尔分布时,基于累积和的统计量达到了$\sqrt{\log\log(8n)}$级别的极小化最优检验速率。当误差分布具有多项式衰减尾部(即存在$\alpha \geq 4$阶有界矩)时,我们引入了一种中位数均值型检验统计量,在稠密和稀疏情形下均实现了接近最优的检验速率。特别地,在稀疏情形下,我们进一步提出了一种计算高效的检验方法以达到精确最优性。令人惊讶的是,在$2 \leq \alpha < 4$这一更具挑战性的情形中,我们发现了一个新现象:极小化最优检验速率不存在稀疏情形,即从信息论角度看,检测稀疏变化与检测稠密变化具有相同的难度。这一现象表明极小化最优检验速率在$\alpha = 4$处发生了相变。