Fisher-Geometric Diffusion in Stochastic Gradient Descent: Optimal Rates, Oracle Complexity, and Information-Theoretic Limits

Classical stochastic-approximation analyses treat the covariance of stochastic gradients as an exogenous modeling input. We show that under exchangeable mini-batch sampling this covariance is identified by the sampling mechanism itself: to leading order it is the projected covariance of per-sample gradients. In well-specified likelihood problems this reduces locally to projected Fisher information; for general M-estimation losses the same object is the projected gradient covariance G*(theta), which together with the Hessian induces sandwich/Godambe geometry. This identification -- not the subsequent diffusion or Lyapunov machinery, which is classical once the noise matrix is given -- is the paper's main contribution. It endogenizes the diffusion coefficient (with effective temperature tau = eta/b), determines the stationary covariance via a Lyapunov equation whose inputs are now structurally fixed, and selects the identified statistical geometry as the natural metric for convergence analysis. We prove matching upper and lower bounds of order Theta(1/N) for risk in this metric under an oracle budget N; the lower bound is established first via a van Trees argument in the parametric Fisher setting and then extended to adaptive oracle transcripts under a predictable-information condition and mild conditional likelihood regularity. Translating these bounds into oracle complexity yields epsilon-stationarity guarantees in the Fisher dual norm that depend on an intrinsic effective dimension d_eff and a statistical condition number kappa_F, rather than ambient dimension or Euclidean conditioning. Numerical experiments confirm the Lyapunov predictions at both continuous-time and discrete-time levels and show that scalar temperature matching cannot reproduce directional noise structure.

翻译：经典随机逼近分析将随机梯度的协方差视为外生建模输入。我们证明，在可交换小批量采样下，该协方差由采样机制本身确定：主导阶项是每个样本梯度的投影协方差。在良好设定的似然问题中，这局部退化为投影 Fisher 信息；对于一般 M-估计损失，同一对象是投影梯度协方差 G*(theta)，其与 Hessian 矩阵共同诱导出三明治/Godambe 几何。这一识别过程（而非后续的扩散或 Lyapunov 机制——后者在噪声矩阵给定后即为经典方法）是本文的主要贡献。它内生了扩散系数（有效温度 tau = eta/b），通过 Lyapunov 方程确定稳态协方差（其输入现在在结构上固定），并将识别出的统计几何选为收敛性分析的自然度量。在预言机预算 N 下，我们证明该度量下风险的最优上下界均为 Theta(1/N) 量级；下界首先通过参数 Fisher 设定下的 van Trees 论证建立，随后在可预测信息条件与温和条件似然正则性下扩展到自适应预言机转录。将这些界限转化为预言机复杂度，可在 Fisher 对偶范数下得到依赖内在有效维度 d_eff 和统计条件数 kappa_F（而非环境维度或欧几里得条件数）的 epsilon-平稳性保证。数值实验在连续时间与离散时间层面均验证了 Lyapunov 预测，并表明标量温度匹配无法再现方向性噪声结构。