Bilevel learning has gained prominence in machine learning, inverse problems, and imaging applications, including hyperparameter optimization, learning data-adaptive regularizers, and optimizing forward operators. The large-scale nature of these problems has led to the development of inexact and computationally efficient methods. Existing adaptive methods predominantly rely on deterministic formulations, while stochastic approaches often adopt a doubly-stochastic framework with impractical variance assumptions, enforces a fixed number of lower-level iterations, and requires extensive tuning. In this work, we focus on bilevel learning with strongly convex lower-level problems and a nonconvex sum-of-functions in the upper-level. Stochasticity arises from data sampling in the upper-level which leads to inexact stochastic hypergradients. We establish their connection to state-of-the-art stochastic optimization theory for nonconvex objectives. Furthermore, we prove the convergence of inexact stochastic bilevel optimization under mild assumptions. Our empirical results highlight significant speed-ups and improved generalization in imaging tasks such as image denoising and deblurring in comparison with adaptive deterministic bilevel methods.
翻译:双水平学习在机器学习、逆问题及成像应用中日益重要,涵盖超参数优化、数据自适应正则化器学习以及前向算子优化等领域。这些问题的规模庞大,推动了非精确且计算高效方法的发展。现有自适应方法主要基于确定性框架,而随机方法通常采用双重随机框架,其方差假设往往不切实际,强制固定下层迭代次数,且需要大量调参。本文聚焦于下层问题强凸、上层目标为非凸函数和形式的双水平学习。随机性源于上层的数据采样,导致非精确随机超梯度的产生。我们建立了此类梯度与非凸目标最先进随机优化理论之间的联系。进一步地,我们在温和假设下证明了非精确随机双水平优化的收敛性。实验结果表明,在图像去噪与去模糊等成像任务中,相比自适应确定性双水平方法,本方法能显著加速训练并提升泛化性能。