High-dimensional linear regression has been thoroughly studied in the context of independent and identically distributed data. We propose to investigate high-dimensional regression models for independent but non-identically distributed data. To this end, we suppose that the set of observed predictors (or features) is a random matrix with a variance profile and with dimensions growing at a proportional rate. Assuming a random effect model, we study the predictive risk of the ridge estimator for linear regression with such a variance profile. In this setting, we provide deterministic equivalents of this risk and of the degree of freedom of the ridge estimator. For certain class of variance profile, our work highlights the emergence of the well-known double descent phenomenon in high-dimensional regression for the minimum norm least-squares estimator when the ridge regularization parameter goes to zero. We also exhibit variance profiles for which the shape of this predictive risk differs from double descent. The proofs of our results are based on tools from random matrix theory in the presence of a variance profile that have not been considered so far to study regression models. Numerical experiments are provided to show the accuracy of the aforementioned deterministic equivalents on the computation of the predictive risk of ridge regression. We also investigate the similarities and differences that exist with the standard setting of independent and identically distributed data.
翻译:高维线性回归已在独立同分布数据的背景下得到深入研究。本文针对独立但非独立同分布数据的高维回归模型展开探索。为此,我们假设观测预测变量(或特征)的集合为带方差剖面的随机矩阵,且其维度以比例速率增长。在随机效应模型假设下,我们研究了此类方差剖面下岭估计量的线性回归预测风险。针对该设定,我们给出了该风险及岭估计量自由度的确定性等价形式。对于特定类别的方差剖面,当岭正则化参数趋近于零时,我们的工作揭示了高维回归中最小范数最小二乘估计量著名的双下降现象的出现。同时,我们展示了预测风险形状异于双下降的方差剖面。研究结果的证明基于随机矩阵理论中考虑方差剖面的新工具,这些工具此前尚未被用于回归模型研究。数值实验验证了上述确定性等价形式在计算岭回归预测风险时的准确性。我们还探讨了其与标准独立同分布数据设定之间的异同。