We consider stochastic optimization problems where the objective depends on some parameter, as commonly found in hyperparameter optimization for instance. We investigate the behavior of the derivatives of the iterates of Stochastic Gradient Descent (SGD) with respect to that parameter and show that they are driven by an inexact SGD recursion on a different objective function, perturbed by the convergence of the original SGD. This enables us to establish that the derivatives of SGD converge to the derivative of the solution mapping in terms of mean squared error whenever the objective is strongly convex. Specifically, we demonstrate that with constant step-sizes, these derivatives stabilize within a noise ball centered at the solution derivative, and that with vanishing step-sizes they exhibit $O(\log(k)^2 / k)$ convergence rates. Additionally, we prove exponential convergence in the interpolation regime. Our theoretical findings are illustrated by numerical experiments on synthetic tasks.
翻译:我们研究一类随机优化问题,其目标函数依赖于某个参数,这在超参数优化等场景中十分常见。我们分析了随机梯度下降(SGD)迭代关于该参数的导数行为,证明了这些导数受一个不同目标函数上的非精确SGD递归所驱动,且该递归受到原始SGD收敛过程的扰动。当目标函数强凸时,这使我们能够从均方误差的意义上证明SGD的导数收敛于解映射的导数。具体而言,我们证明了在固定步长下,这些导数会稳定在解导数为中心的噪声球内;而在递减步长下,它们呈现出$O(\log(k)^2 / k)$的收敛速率。此外,我们在插值区域证明了指数收敛性。数值实验在合成任务上验证了我们的理论结果。