We investigate the high-dimensional linear regression problem in the presence of noise correlated with Gaussian covariates. This correlation, known as endogeneity in regression models, often arises from unobserved variables and other factors. It has been a major challenge in causal inference and econometrics. When the covariates are high-dimensional, it has been common to assume sparsity on the true parameters and estimate them using regularization, even with the endogeneity. However, when sparsity does not hold, it has not been well understood to control the endogeneity and high dimensionality simultaneously. This study demonstrates that an estimator without regularization can achieve consistency, that is, benign overfitting, under certain assumptions on the covariance matrix. Specifically, our results show that the error of this estimator converges to zero when the covariance matrices of correlated noise and instrumental variables satisfy a condition on their eigenvalues. We consider several extensions relaxing these conditions and conduct experiments to support our theoretical findings. As a technical contribution, we utilize the convex Gaussian minimax theorem (CGMT) in our dual problem and extend CGMT itself.
翻译:我们研究了在高斯协变量存在相关噪声情况下的高维线性回归问题。这种回归模型中的相关性(称为内生性)通常源于未观测变量及其他因素,已成为因果推断与计量经济学中的重大挑战。当协变量为高维时,即便存在内生性,通常假设真实参数具有稀疏性并通过正则化进行估计。然而,当稀疏性不成立时,如何同时控制内生性与高维性尚未得到充分理解。本研究表明,在协方差矩阵满足特定假设的条件下,无需正则化的估计量即可实现一致性(即良性过拟合)。具体而言,我们的结果表明,当相关噪声与工具变量的协方差矩阵满足特征值条件时,该估计量的误差收敛于零。我们考虑了几种放宽这些条件的扩展情形,并通过实验支持了理论发现。作为技术贡献,我们在对偶问题中应用了凸高斯极小极大定理(CGMT)并对其本身进行了推广。