In this paper, we develop invariance-based procedures for testing and inference in high-dimensional regression models. These procedures, also known as randomization tests, provide several important advantages. First, for the global null hypothesis of significance, our test is valid in finite samples. It is also simple to implement and comes with finite-sample guarantees on statistical power. Remarkably, despite its simplicity, this testing idea has escaped the attention of earlier analytical work, which mainly concentrated on complex high-dimensional asymptotic methods. Under an additional assumption of Gaussian design, we show that this test also achieves the minimax optimal rate against certain nonsparse alternatives, a type of result that is rare in the literature. Second, for partial null hypotheses, we propose residual-based tests and derive theoretical conditions for their validity. These tests can be made powerful by constructing the test statistic in a way that, first, selects the important covariates (e.g., through Lasso) and then orthogonalizes the nuisance parameters. We illustrate our results through extensive simulations and applied examples. One consistent finding is that the strong finite-sample guarantees associated with our procedures result in added robustness when it comes to handling multicollinearity and heavy-tailed covariates.
翻译:本文发展了高维回归模型中的不变性检验与推断方法。这些方法(又称随机化检验)具有多项重要优势。首先,针对全局零假设的显著性检验,该方法在有限样本下成立,且实现简单、具有统计功效的有限样本保证。值得注意的是,尽管该方法简单直接,却未被早期分析工作所关注——这些工作主要集中于复杂的高维渐近方法。在高斯设计假设下,我们证明该检验对某些非稀疏备择假设能达到极小极大最优速率,这类结论在文献中较为罕见。其次,针对部分零假设,我们提出基于残差的检验方法,并推导其有效性的理论条件。通过先筛选重要协变量(如使用Lasso方法)再正交化 nuisance 参数来构造检验统计量,可使这些检验具有强大功效。我们通过大量模拟和实际数据分析验证了所提方法。一致发现是:伴随我们方法而来的强有限样本保证,在处理多重共线性与重尾协变量时能显著增强稳健性。