We consider a general model for high-dimensional empirical risk minimization whereby the data $\mathbf{x}_i$ are $d$-dimensional isotropic Gaussian vectors, the model is parametrized by $\mathbf{\Theta}\in\mathbb{R}^{d\times k}$, and the loss depends on the data via the projection $\mathbf{\Theta}^\mathsf{T}\mathbf{x}_i$. This setting covers as special cases classical statistics methods (e.g. multinomial regression and other generalized linear models), but also two-layer fully connected neural networks with $k$ hidden neurons. We use the Kac-Rice formula from Gaussian process theory to derive a bound on the expected number of local minima of this empirical risk, under the proportional asymptotics in which $n,d\to\infty$, with $n\asymp d$. Via Markov's inequality, this bound allows to determine the positions of these minimizers (with exponential deviation bounds) and hence derive sharp asymptotics on the estimation and prediction error. In this paper, we apply our characterization to convex losses, where high-dimensional asymptotics were not (in general) rigorously established for $k\ge 2$. We show that our approach is tight and allows to prove previously conjectured results. In addition, we characterize the spectrum of the Hessian at the minimizer. A companion paper applies our general result to non-convex examples.
翻译:我们考虑一个高维经验风险最小化的一般模型,其中数据$\mathbf{x}_i$为$d$维各向同性高斯向量,模型由$\mathbf{\Theta}\in\mathbb{R}^{d\times k}$参数化,且损失函数通过投影$\mathbf{\Theta}^\mathsf{T}\mathbf{x}_i$依赖于数据。该框架涵盖了经典统计方法(如多项回归及其他广义线性模型)作为特例,同时也包含具有$k$个隐藏神经元的两层全连接神经网络。我们利用高斯过程理论中的Kac-Rice公式推导了该经验风险期望局部极小值数量的上界,该推导基于$n,d\to\infty$且$n\asymp d$的比例渐近假设。通过马尔可夫不等式,该上界能够确定这些极小值点的位置(具有指数型偏差界),从而推导出估计误差与预测误差的精确渐近性质。本文将该特征刻画应用于凸损失函数——对于$k\ge 2$的情形,其高维渐近性质尚未(在普遍意义上)得到严格建立。我们证明所提方法是紧致的,且能够验证先前猜想的结果。此外,我们刻画了极小值点处海森矩阵的谱特性。本研究的姊妹篇将把该一般性结果应用于非凸实例。