General first order methods (GFOMs), including various gradient descent and AMP algorithms, constitute a broad class of iterative algorithms in modern statistical learning problems. Some GFOMs also serve as constructive proof devices, iteratively characterizing the empirical distributions of statistical estimators in the large system limits for any fixed number of iterations. This paper develops a non-asymptotic, entrywise characterization for a general class of GFOMs. Our characterizations capture the precise entrywise behavior of the GFOMs, and hold universally across a broad class of heterogeneous random matrix models. As a corollary, we provide the first non-asymptotic description of the empirical distributions of the GFOMs beyond Gaussian ensembles. We demonstrate the utility of these general results in two applications. In the first application, we prove entrywise universality for regularized least squares estimators in the linear model, by controlling the entrywise error relative to a suitably constructed GFOM. This algorithmic proof method also leads to systematically improved averaged universality results for regularized regression estimators in the linear model, and resolves the universality conjecture for (regularized) MLEs in logistic regression. In the second application, we obtain entrywise Gaussian approximations for a class of gradient descent algorithms. Our approach provides non-asymptotic state evolution for the bias and variance of the algorithm along the iteration path, applicable for non-convex loss functions. The proof relies on a new recursive leave-k-out method that provides almost delocalization for the GFOMs and their derivatives. Crucially, our method ensures entrywise universality for up to poly-logarithmic many iterations, which facilitates effective $\ell_2/\ell_\infty$ control between certain GFOMs and statistical estimators in applications.
翻译:广义一阶方法(GFOMs)是现代统计学习问题中一类广泛的迭代算法,涵盖了各种梯度下降算法与近似消息传递(AMP)算法。部分GFOMs还可作为构造性证明工具,在任意固定迭代步数的大系统极限下,迭代地刻画统计估计量经验分布的渐近行为。本文针对一大类广义GFOMs建立了非渐近的逐元刻画理论。我们的刻画精确捕捉了GFOMs的逐元动态行为,并在广泛的异质随机矩阵模型类中具有普适性。作为推论,我们首次在非高斯系综上给出了GFOMs经验分布的非渐近描述。我们通过两个应用场景展示这些一般性结论的效用。在第一个应用中,通过控制正则化最小二乘估计量与适当构造的GFOM之间的逐元误差,我们证明了线性模型中正则化最小二乘估计量的逐元普适性。这种算法化证明方法还系统性地改进了线性模型中正则化回归估计量的平均普适性结果,并解决了逻辑回归中(正则化)极大似然估计量的普适性猜想。在第二个应用中,我们获得了一类梯度下降算法的逐元高斯逼近。我们的方法为算法沿迭代路径的偏差与方差提供了非渐近的状态演化描述,且适用于非凸损失函数。证明依赖于一种新颖的递归留k-out方法,该方法为GFOMs及其导数提供了几乎完全的去局部化性质。关键的是,我们的方法能保证高达多对数量级迭代步数内的逐元普适性,这为应用中特定GFOMs与统计估计量之间的有效$\ell_2/\ell_\infty$控制提供了基础。