We study a hypothesis testing problem in the context of high-dimensional changepoint detection. Given a matrix $X \in \mathbb{R}^{p \times n}$ with independent Gaussian entries, the goal is to determine whether or not a sparse, non-null fraction of rows in $X$ exhibits a shift in mean at a common index between $1$ and $n$. We focus on three aspects of this problem: the sparsity of non-null rows, the presence of a single, common changepoint in the non-null rows, and the signal strength associated with the changepoint. Within an asymptotic regime relating the data dimensions $n$ and $p$ to the signal sparsity and strength, we characterize the information-theoretic limits of the testing problem by a formula that determines whether the sum of Type I and II errors tends to zero or is bounded away from zero. The formula, called the \emph{detection boundary}, is a curve that separates the parameter space into a detectable region and an undetectable region. We show that a Berk--Jones type test statistic can detect the presence of a sparse non-null fraction of rows, and does so adaptively throughout the detectable region. Conversely, within the undetectable region, no test is able to consistently distinguish the signal from noise.
翻译:我们研究高维变点检测背景下的一个假设检验问题。给定一个由独立高斯元素构成的矩阵 $X \in \mathbb{R}^{p \times n}$,目标是判断 $X$ 中一个稀疏的非零行比例是否在 $1$ 到 $n$ 之间的共同索引处存在均值偏移。我们聚焦于该问题的三个方面:非零行的稀疏性、非零行中单一共同变点的存在性,以及与变点相关的信号强度。在一个将数据维度 $n$ 和 $p$ 与信号稀疏性和强度关联起来的渐近体系中,我们通过一个公式刻画了该检验问题的信息论极限,该公式决定了第一类与第二类错误之和是趋于零还是远离零。这个公式被称为“检测边界”,它是一条将参数空间分为可检测区域和不可检测区域的曲线。我们证明,一种Berk–Jones型检验统计量能够检测稀疏的非零行比例的存在,并在整个可检测区域内自适应地实现这一点。相反,在不可检测区域内,任何检验都无法一致地将信号与噪声区分开来。