In this paper we study the asymptotics of linear regression in settings where the covariates exhibit a linear dependency structure, departing from the standard assumption of independence. We model the covariates using stochastic processes with spatio-temporal covariance and analyze the performance of ridge regression in the high-dimensional proportional regime, where the number of samples and feature dimensions grow proportionally. A Gaussian universality theorem is proven, demonstrating that the asymptotics are invariant under replacing the covariates with Gaussian vectors preserving mean and covariance. Next, leveraging tools from random matrix theory, we derive precise characterizations of the estimation error. The estimation error is characterized by a fixed-point equation involving the spectral properties of the spatio-temporal covariance matrices, enabling efficient computation. We then study optimal regularization, overparameterization, and the double descent phenomenon in the context of dependent data. Simulations validate our theoretical predictions, shedding light on how dependencies influence estimation error and the choice of regularization parameters.
翻译:本文研究协变量呈现线性依赖结构时线性回归的渐近性质,突破了传统独立性假设的局限。我们使用时-空协方差随机过程对协变量建模,并在高维比例体系(样本数与特征维度成比例增长)中分析岭回归的性能。文中证明了一个高斯普适性定理,表明在保持均值与协方差不变的条件下用高斯向量替换协变量时渐近性质保持不变。随后,借助随机矩阵理论工具,我们推导出估计误差的精确表征。该估计误差通过一个涉及时-空协方差矩阵谱性质的定点方程进行刻画,从而实现了高效计算。在此基础上,我们研究了依赖数据场景下的最优正则化、过参数化以及双下降现象。仿真实验验证了理论预测,揭示了数据依赖性如何影响估计误差与正则化参数的选择。