We study high-dimensional convex empirical risk minimization (ERM) under general non-Gaussian data designs. By heuristically extending the Convex Gaussian Min-Max Theorem (CGMT) to non-Gaussian settings, we derive an asymptotic min-max characterization of key statistics, enabling approximation of the mean $μ_{\hatθ}$ and covariance $C_{\hatθ}$ of the ERM estimator $\hatθ$. Specifically, under a concentration assumption on the data matrix and standard regularity conditions on the loss and regularizer, we show that for a test covariate $x$ independent of the training data, the projection $\hatθ^\top x$ approximately follows the convolution of the (generally non-Gaussian) distribution of $μ_{\hatθ}^\top x$ with an independent centered Gaussian variable of variance $\text{Tr}(C_{\hatθ}\mathbb{E}[xx^\top])$. This result clarifies the scope and limits of Gaussian universality for ERMs. Additionally, we prove that any $\mathcal{C}^2$ regularizer is asymptotically equivalent to a quadratic form determined solely by its Hessian at zero and gradient at $μ_{\hatθ}$. Numerical simulations across diverse losses and models are provided to validate our theoretical predictions and qualitative insights.
翻译:我们研究了一般非高斯数据设计下的高维凸经验风险最小化(ERM)。通过启发式地将凸高斯极小极大定理(CGMT)扩展到非高斯场景,我们推导出关键统计量的渐近极小极大表征,从而能够近似估计ERM估计量 $\hatθ$ 的均值 $μ_{\hatθ}$ 和协方差 $C_{\hatθ}$。具体而言,在数据矩阵的集中性假设及损失函数与正则化项的标准正则性条件下,我们证明:对于独立于训练数据的测试协变量 $x$,投影 $\hatθ^\top x$ 近似服从 $μ_{\hatθ}^\top x$ 的(通常非高斯的)分布与方差为 $\text{Tr}(C_{\hatθ}\mathbb{E}[xx^\top])$ 的独立中心高斯变量的卷积。这一结果阐明了ERM高斯普适性的适用范围与局限性。此外,我们证明任何 $\mathcal{C}^2$ 正则化项渐近等价于一个完全由其零点的Hessian矩阵和 $μ_{\hatθ}$ 处的梯度决定的二次型。我们提供了涵盖多种损失函数与模型的数值模拟,以验证理论预测与定性洞见。