Many high-dimensional data sets suffer from hidden confounding which affects both the predictors and the response of interest. In such situations, standard regression methods or algorithms lead to biased estimates. This paper substantially extends previous work on spectral deconfounding for high-dimensional linear models to the nonlinear setting and with this, establishes a proof of concept that spectral deconfounding is valid for general nonlinear models. Concretely, we propose an algorithm to estimate high-dimensional sparse additive models in the presence of hidden dense confounding: arguably, this is a simple yet practically useful nonlinear scope. We prove consistency and convergence rates for our method and evaluate it on synthetic data and a genetic data set.
翻译:许多高维数据集受到潜在混杂因素的影响,这些因素同时影响预测变量和关注响应变量。在此类情况下,标准回归方法或算法会导致估计偏差。本文显著拓展了高维线性模型中光谱去混杂方法的研究范围,将其延伸至非线性场景,并以此建立概念验证,证明光谱去混杂对于一般非线性模型具有有效性。具体而言,我们提出一种在存在密集潜在混杂的情况下估计高维稀疏可加模型的算法:这无疑是一个简洁而具有实际应用价值的非线性研究范畴。我们证明了该方法的相合性并给出收敛速率,同时在合成数据和遗传数据集上进行了评估。