Motivated by the simultaneous association analysis with the presence of latent confounders, this paper studies the large-scale hypothesis testing problem for the high-dimensional confounded linear models with both non-asymptotic and asymptotic false discovery control. Such model covers a wide range of practical settings where both the response and the predictors may be confounded. In the presence of the high-dimensional predictors and the unobservable confounders, the simultaneous inference with provable guarantees becomes highly challenging, and the unknown strong dependence among the confounded covariates makes the challenge even more pronounced. This paper first introduces a decorrelating procedure that shrinks the confounding effect and weakens the correlations among the predictors, then performs debiasing under the decorrelated design based on some biased initial estimator. Following that, an asymptotic normality result for the debiased estimator is established and standardized test statistics are then constructed. Furthermore, a simultaneous inference procedure is proposed to identify significant associations, and both the finite-sample and asymptotic false discovery bounds are provided. The non-asymptotic result is general and model-free, and is of independent interest. We also prove that, under minimal signal strength condition, all associations can be successfully detected with probability tending to one. Simulation and real data studies are carried out to evaluate the performance of the proposed approach and compare it with other competing methods.
翻译:摘要:受存在潜在混淆因素时进行同时关联分析的动机驱动,本文研究高维混淆线性模型下的大规模假设检验问题,并实现了非渐近与渐近错误发现率控制。该模型涵盖响应变量与预测变量均可能受混淆影响的广泛实际场景。在高维预测变量与不可观测混淆因素并存的情况下,具备可证明保证的同时推断面临极大挑战,而混淆协变量间未知的强依赖性使该挑战更加显著。本文首先引入一种解相关程序以削弱混淆效应并降低预测变量间的相关性,随后基于有偏初始估计量在解相关设计下执行去偏操作。在此基础上,建立去偏估计量的渐近正态性并构造标准化检验统计量。进一步,提出一种同时推断程序以识别显著关联,并提供有限样本与渐近错误发现率边界。该非渐近结果具有通用性与模型无关性,具有独立研究价值。我们同时证明,在最小信号强度条件下,所有关联均能以趋近于1的概率被成功检测。通过模拟与真实数据研究评估所提方法的性能,并与其他竞争方法进行对比。