Classical asymptotic theory for statistical inference usually involves calibrating a statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. Recently, much effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d$ and $n$ both increase to infinity together. This often leads to different inference procedures, depending on the assumptions about the dimensionality, leaving the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference; developing methods whose validity does not depend on any assumption on $d$ versus $n$. We introduce an approach that uses variational representations of existing test statistics along with sample splitting and self-normalization to produce a refined test statistic with a Gaussian limiting distribution, regardless of how $d$ scales with $n$. The resulting statistic can be viewed as a careful modification of degenerate U-statistics, dropping diagonal blocks and retaining off-diagonal blocks. We exemplify our technique for some classical problems including one-sample mean and covariance testing, and show that our tests have minimax rate-optimal power against appropriate local alternatives. In most settings, our cross U-statistic matches the high-dimensional power of the corresponding (degenerate) U-statistic up to a $\sqrt{2}$ factor.
翻译:经典统计推断的渐近理论通常通过固定维度$d$、令样本量$n$趋于无穷来校准统计量。近年来,大量研究致力于理解这些方法在高维场景(即$d$与$n$同时趋于无穷)中的行为表现。这往往导致推断程序因维度假定不同而各异,使实践者陷入两难:面对一个包含100个样本、20个维度的数据集,应假定$n \gg d$还是$d/n \approx 0.2$进行校准?本文致力于实现维度无关的推断目标,即开发一种其有效性不依赖于$d$与$n$关系的统计方法。我们提出通过引入现有检验统计量的变分表示,结合样本划分与自归一化技术,构建一个具有高斯极限分布的精细化检验统计量,该分布与$d$随$n$的缩放方式无关。所得统计量可视为退化U统计量的精心改进:通过舍弃对角块并保留非对角块实现。我们以单样本均值检验和协方差检验等经典问题为例验证该技术,证明我们的检验在适当局部备择假设下具有极小极大速率最优的检验功效。在多数场景中,我们的交叉U统计量在高维情形下的检验功效与对应(退化)U统计量仅相差$\sqrt{2}$倍因子。