Many high-dimensional data sets suffer from hidden confounding which affects both the predictors and the response of interest. In such situations, standard regression methods or algorithms lead to biased estimates. This paper substantially extends previous work on spectral deconfounding for high-dimensional linear models to the nonlinear setting and with this, establishes a proof of concept that spectral deconfounding is valid for general nonlinear models. Concretely, we propose an algorithm to estimate high-dimensional sparse additive models in the presence of hidden dense confounding: arguably, this is a simple yet practically useful nonlinear scope. We prove consistency and convergence rates for our method and evaluate it on synthetic data and a genetic data set.
翻译:许多高维数据集受到潜在混杂因素的影响,这些混杂因素同时影响预测变量和响应变量。在此类情形下,标准回归方法或算法会导致估计偏差。本文将先前关于高维线性模型中谱解混杂的研究大幅扩展至非线性场景,并以此建立谱解混杂对一般非线性模型有效性的概念验证。具体而言,我们提出一种在存在密集潜在混杂的情况下估计高维稀疏可加模型的算法:这可以说是一种简单却具有实际应用价值的非线性框架。我们证明了该方法的相合性并给出收敛速率,同时在合成数据和遗传数据集上进行了评估。