Deciding when to stop stochastic gradient descent (SGD) has long remained unresolved in a statistically rigorous sense. While SGD is routinely monitored as it runs, the classical theory of SGD provides guarantees only at pre-specified iteration horizons and offers no valid way to decide, based on the observed trajectory, when further computation is justified. We address this gap by developing anytime-valid confidence sequences for stochastic gradient methods, which remain valid under continuous monitoring and directly induce statistically valid, trajectory-dependent stopping rules: stop as soon as the current upper confidence bound on an appropriate performance measure falls below a user-specified tolerance. The confidence sequences are constructed using nonnegative supermartingales, are time-uniform, and depend only on observable quantities along the SGD trajectory, without requiring prior knowledge of the optimization horizon. In convex optimization, this yields anytime-valid certificates for weighted suboptimality of projected SGD under general stepsize schedules, without assuming smoothness or strong convexity. In nonconvex optimization, it yields time-uniform certificates for weighted first-order stationarity under smoothness assumptions. We further characterize the stopping-time complexity of the resulting stopping rules under standard stepsize schedules. To the best of our knowledge, this is the first framework that provides statistically valid, time-uniform stopping rules for SGD across both convex and nonconvex settings based solely on its observed trajectory.
翻译:决定何时停止随机梯度下降(SGD)长期以来在统计学严格意义上一直未能得到解决。虽然SGD在运行过程中被常规监控,但经典的SGD理论仅提供预先指定迭代次数下的保证,并且没有提供基于观测轨迹的有效方法来判定何时进一步的计算是合理的。我们通过为随机梯度方法开发任意时间有效的置信序列来填补这一空白,这些序列在连续监控下保持有效,并可直接导出具有统计有效性、依赖于轨迹的停止准则:一旦关于适当性能度量的当前置信上界低于用户指定的容差,即停止。置信序列使用非负上鞅构建,具有时间一致性,并且仅依赖于SGD轨迹上的可观测量,无需预先知道优化总时长。在凸优化中,这为投影SGD在一般步长调度下的加权次优性提供了任意时间有效的证明,无需假设光滑性或强凸性。在非凸优化中,它在光滑性假设下为加权一阶平稳性提供了时间一致的证明。我们进一步刻画了在标准步长调度下所得停止准则的停止时间复杂性。据我们所知,这是首个仅基于SGD观测轨迹,为凸与非凸两种设置提供具有统计有效性、时间一致停止准则的框架。