In this paper we study the asymptotics of linear regression in settings with non-Gaussian covariates where the covariates exhibit a linear dependency structure, departing from the standard assumption of independence. We model the covariates using stochastic processes with spatio-temporal covariance and analyze the performance of ridge regression in the high-dimensional proportional regime, where the number of samples and feature dimensions grow proportionally. A Gaussian universality theorem is proven, demonstrating that the asymptotics are invariant under replacing the non-Gaussian covariates with Gaussian vectors preserving mean and covariance, for which tools from random matrix theory can be used to derive precise characterizations of the estimation error. The estimation error is characterized by a fixed-point equation involving the spectral properties of the spatio-temporal covariance matrices, enabling efficient computation. We then study optimal regularization, overparameterization, and the double descent phenomenon in the context of dependent data. Simulations validate our theoretical predictions, shedding light on how dependencies influence estimation error and the choice of regularization parameters.
翻译:本文研究非高斯协变量下线性回归的渐近性质,其中协变量呈现线性依赖结构,突破了独立性标准假设。我们使用时-空协方差随机过程对协变量建模,并在高维比例体系(样本数与特征维度成比例增长)中分析岭回归的性能。证明了一个高斯普适性定理,表明将非高斯协变量替换为保持均值与协方差的随机高斯向量时渐近性质保持不变,这允许运用随机矩阵理论工具推导估计误差的精确表征。估计误差通过涉及时-空协方差矩阵谱特性的不动点方程表征,可实现高效计算。随后我们研究了依赖数据背景下的最优正则化、过参数化及双下降现象。仿真实验验证了理论预测,揭示了数据依赖性如何影响估计误差与正则化参数的选择。