Stochastic gradient descent (SGD) is central to simulation optimization, stochastic programming, and online M-estimation, where sampling effort is a decision variable. We study the mini-batch gradient noise as a sampling-design object. Under exchangeable fresh-sampling mini-batches, the conditional covariance given the de Finetti directing measure mu is b^{-1} G_mu(theta), and under identifiability the projected population object is b^{-1} G*(theta) -- projected Fisher information for correctly specified likelihoods, the sandwich partner of the Hessian otherwise. This identification fixes the noise matrix entering the diffusion analysis of constant-step SGD: the raw iterate path has a deterministic fluid limit, and the sqrt(b/eta)-scaled fluctuations satisfy a functional CLT with noise covariance G*; near a nondegenerate optimum the limit is Ornstein-Uhlenbeck, and its Lyapunov covariance scaled by eta/b matches the linearized discrete recursion at leading order. Under a curvature-noise compatibility condition mu_F > 0, we prove 1/N mean-square upper bounds and an i.i.d. parametric Fisher van Trees lower bound of the same rate order, with oracle-complexity guarantees depending on an effective dimension d_eff and condition number kappa_F. Numerical experiments verify the identification and confirm the Lyapunov predictions in direct SGD.
翻译:随机梯度下降(SGD)是仿真优化、随机规划以及在线M估计的核心方法,其中采样量被视为决策变量。本研究将小批量梯度噪声视为采样设计对象。在可交换新鲜采样小批量的条件下,给定de Finetti测度μ的条件协方差为b^{-1} G_μ(θ);在可辨识性条件下,投影后的总体对象为b^{-1} G*(θ)——对于正确设定的似然函数,此即投影Fisher信息量;否则为海森矩阵的夹心配对项。这一辨识结果确定了进入常步长SGD扩散分析的噪声矩阵:原始迭代路径具有确定性流体极限,而经sqrt(b/η)缩放后的波动满足泛函中心极限定理,其噪声协方差为G*;在非退化最优值附近,极限过程为Ornstein-Uhlenbeck过程,其经η/b缩放的Lyapunov协方差与线性化离散递归主导阶项相匹配。在曲率-噪声相容性条件μ_F > 0下,我们证明1/N阶均方上界与同阶的独立同分布参数型Fisher-van Trees下界,其预言机复杂度保证取决于有效维度d_eff和条件数κ_F。数值实验验证了该辨识结果,并确认了直接SGD中Lyapunov预测的有效性。