Many optimization problems require hyperparameters, i.e., parameters that must be pre-specified in advance, such as regularization parameters and parametric regularizers in variational regularization methods for inverse problems, and dictionaries in compressed sensing. A data-driven approach to determine appropriate hyperparameter values is via a nested optimization framework known as bilevel learning. Even when it is possible to employ a gradient-based solver to the bilevel optimization problem, construction of the gradients, known as hypergradients, is computationally challenging, each one requiring both a solution of a minimization problem and a linear system solve. These systems do not change much during the iterations, which motivates us to apply recycling Krylov subspace methods, wherein information from one linear system solve is re-used to solve the next linear system. Existing recycling strategies often employ eigenvector approximations called Ritz vectors. In this work we propose a novel recycling strategy based on a new concept, Ritz generalized singular vectors, which acknowledge the bilevel setting. Additionally, while existing iterative methods primarily terminate according to the residual norm, this new concept allows us to define a new stopping criterion that directly approximates the error of the associated hypergradient. The proposed approach is validated through extensive numerical testing in the context of an inverse problem in imaging.
翻译:许多优化问题需要超参数,即必须预先指定的参数,例如反问题变分正则化方法中的正则化参数和参数化正则化器,以及压缩感知中的字典。确定合适超参数值的一种数据驱动方法是通过称为双层学习的嵌套优化框架。即使能够采用基于梯度的求解器处理双层优化问题,梯度(称为超梯度)的构造在计算上仍具有挑战性,每个超梯度的计算既需要求解一个最小化问题,又需要求解一个线性系统。这些系统在迭代过程中变化不大,这促使我们应用循环Krylov子空间方法,即将一个线性系统求解的信息重新用于求解下一个线性系统。现有的循环策略通常采用称为Ritz向量的特征向量近似。在本工作中,我们提出了一种基于新概念——Ritz广义奇异向量——的新型循环策略,该概念充分考虑了双层学习框架的特性。此外,虽然现有迭代方法主要依据残差范数终止迭代,这一新概念使我们能够定义一种新的停止准则,可直接近似相关超梯度的误差。所提出的方法通过在成像反问题背景下的广泛数值测试得到了验证。