Exponential moving averages (EMAs) are a central component of widely used adaptive optimizers such as Adam. However, existing analyses of Adam-style methods often yield suboptimal guarantees in the zero-noise regime, rely on open-loop parameter schedules, or require prior knowledge of smoothness constants. Motivated by these limitations, we introduce OptEMA and analyze two complementary variants: OptEMA-M, which applies an adaptive, decreasing EMA coefficient to the first moment with a fixed second-moment decay, and OptEMA-V, which swaps these roles. At the heart of these variants is a Corrected AdaGrad-Norm coefficient schedule. This formulation renders OptEMA algorithmically closed-loop and Lipschitz-free, meaning its effective stepsizes are trajectory-dependent and require no parameterization via the Lipschitz constant. Under lower-boundedness, unbiasedness, bounded variance, average smoothness, and a bounded stochastic-gradient condition used to control the adaptive normalizers, we prove that both variants achieve the unified noise-adaptive rate $\tilde{\mathcal{O}} \left(T^{-1/2}+σ^{1/2}T^{-1/4}\right)$ for the averaged gradient norm. In the zero-noise regime, these bounds automatically reduce to the nearly optimal deterministic rate $\widetilde{\mathcal{O}}(T^{-1/2})$ without manual hyperparameter retuning.
翻译:指数移动平均(EMA)是Adam等广泛使用的自适应优化器的核心组件。然而,现有对Adam类方法的分析通常在零噪声场景下只能获得次优保证,依赖开环参数调度策略,或需要预先知晓平滑常数。受这些局限性的启发,我们提出OptEMA并分析两种互补变体:OptEMA-M对动量项采用自适应递减的EMA系数并固定二阶矩衰减率,而OptEMA-V则交换这两者的角色。这些变体的核心是校正的AdaGrad-Norm系数调度方案。该公式使OptEMA在算法层面实现闭环且免于Lipschitz常数依赖,即其有效步长取决于具体轨迹,且无需通过Lipschitz常数进行参数化。在满足下界有界性、无偏性、有界方差、平均平滑性以及用于控制自适应归一化算子的有界随机梯度条件假设下,我们证明两种变体在平均梯度范数下均能达到统一的噪声自适应速率$\tilde{\mathcal{O}} \left(T^{-1/2}+σ^{1/2}T^{-1/4}\right)$。在零噪声场景下,这些界自动退化为接近最优的确定性速率$\widetilde{\mathcal{O}}(T^{-1/2})$,无需手动重调超参数。