We investigate the high-dimensional linear regression problem in situations where there is noise correlated with Gaussian covariates. In regression models, the phenomenon of the correlated noise is called endogeneity, which is due to unobserved variables and others, and has been a major problem setting in causal inference and econometrics. When the covariates are high-dimensional, it has been common to assume sparsity on the true parameters and estimate them using regularization, even with the endogeneity. However, when sparsity does not hold, it has not been well understood to control the endogeneity and high dimensionality simultaneously. In this paper, we demonstrate that an estimator without regularization can achieve consistency, i.e., benign overfitting, under certain assumptions on the covariance matrix. Specifically, we show that the error of this estimator converges to zero when covariance matrices of the correlated noise and instrumental variables satisfy a condition on their eigenvalues. We consider several extensions to relax these conditions and conduct experiments to support our theoretical findings. As a technical contribution, we utilize the convex Gaussian minimax theorem (CGMT) in our dual problem and extend the CGMT itself.
翻译:我们研究了在噪声与高斯协变量相关情况下的高维线性回归问题。在回归模型中,相关噪声现象被称为内生性,这源于未观测变量等因素,并已成为因果推断和计量经济学中的主要问题设定。当协变量为高维时,即使存在内生性,通常也假设真实参数具有稀疏性,并通过正则化方法进行估计。然而,当稀疏性不成立时,如何同时控制内生性和高维性尚未得到充分理解。本文证明,在协方差矩阵的特定假设下,无需正则化的估计量可以实现一致性,即良性过拟合。具体而言,我们证明当相关噪声和工具变量的协方差矩阵满足特征值条件时,该估计量的误差收敛于零。我们考虑了几种扩展以放宽这些条件,并通过实验支持理论发现。作为技术贡献,我们在对偶问题中利用凸高斯极小极大定理(CGMT),并扩展了CGMT本身。