While powerful methods have been developed for high-dimensional hypothesis testing assuming orthogonal parameters, current approaches struggle to generalize to the more common non-orthogonal case. We propose Stable Distillation (SD), a simple paradigm for iteratively extracting independent pieces of information from observed data, assuming a parametric model. When applied to hypothesis testing for large regression models, SD orthogonalizes the effect estimates of non-orthogonal predictors by judiciously introducing noise into the observed outcomes vector, yielding mutually independent p-values across predictors. Simulations and a real regression example using US campaign contributions show that SD yields a scalable approach for non-orthogonal designs that exceeds or matches the power of existing methods against sparse alternatives. While we only present explicit SD algorithms for hypothesis testing in ordinary least squares and logistic regression, we provide general guidance for deriving and improving the power of SD procedures.
翻译:尽管针对正交参数的高维假设检验已经发展出强大的方法,但现有方法难以推广至更常见的非正交情形。我们提出稳定蒸馏(Stable Distillation, SD)——一种基于参数模型从观测数据中迭代提取独立信息单元的简单范式。当应用于大规模回归模型的假设检验时,SD通过审慎地在观测结果向量中注入噪声,正交化非正交预测变量的效应估计值,从而产生跨预测变量相互独立的p值。仿真实验与使用美国竞选捐款数据的真实回归案例表明,SD为非正交设计提供了一种可扩展的方法,在应对稀疏备择假设时,其功效超越或匹配现有方法。尽管本文仅针对普通最小二乘回归与逻辑回归给出显式SD算法,但我们提供了改进SD流程功效的通用指导原则。