Many high-dimensional data sets suffer from hidden confounding. When hidden confounders affect both the predictors and the response in a high-dimensional regression problem, standard methods lead to biased estimates. This paper substantially extends previous work on spectral deconfounding for high-dimensional linear models to the nonlinear setting and with this, establishes a proof of concept that spectral deconfounding is valid for general nonlinear models. Concretely, we propose an algorithm to estimate high-dimensional additive models in the presence of hidden dense confounding: arguably, this is a simple yet practically useful nonlinear scope. We prove consistency and convergence rates for our method and evaluate it on synthetic data and a genetic data set.
翻译:许多高维数据集受到隐藏混杂因素的影响。当隐藏混杂因素同时影响高维回归问题中的预测变量和响应变量时,标准方法会导致有偏估计。本文将先前针对高维线性模型的谱去混淆方法大幅扩展至非线性场景,并以此证明谱去混淆方法对一般非线性模型具有有效性的概念验证。具体而言,我们提出了一种在存在隐藏密集混杂因素情况下估计高维加性模型的算法:这可以说是简单且具有实际应用价值的非线性框架。我们证明了该方法的相合性与收敛速率,并在合成数据集和遗传数据集上进行了评估。