Learning to optimize is an approach that leverages training data to accelerate the solution of optimization problems. Many approaches use unrolling to parametrize the update step and learn optimal parameters. Although L2O has shown empirical advantages over classical optimization algorithms, memory restrictions often greatly limit the unroll length and learned algorithms usually do not provide convergence guarantees. In contrast, we introduce a novel method employing a greedy strategy that learns iteration-specific parameters by minimizing the function value at the next iteration. This enables training over significantly more iterations while maintaining constant GPU memory usage. We parameterize the update such that parameter learning corresponds to solving a convex optimization problem at each iteration. In particular, we explore preconditioned gradient descent with multiple parametrizations including a novel convolutional preconditioner. With our learned algorithm, convergence in the training set is proved even when the preconditioner is neither symmetric nor positive definite. Convergence on a class of unseen functions is also obtained, ensuring robust performance and generalization beyond the training data. We test our learned algorithms on two inverse problems, image deblurring and Computed Tomography, on which learned convolutional preconditioners demonstrate improved empirical performance over classical optimization algorithms such as Nesterov's Accelerated Gradient Method and the quasi-Newton method L-BFGS.
翻译:学习优化是一种利用训练数据加速优化问题求解的方法。许多方法通过展开技术对更新步骤进行参数化并学习最优参数。尽管L2O在经验上已显示出优于经典优化算法的优势,但内存限制通常严重限制了展开长度,且所学算法通常无法提供收敛性保证。相比之下,我们提出了一种采用贪心策略的新方法,通过最小化下一迭代点的函数值来学习迭代专用参数。这使得训练能够在保持GPU内存使用量恒定的同时,显著增加迭代次数。我们将更新步骤参数化,使得参数学习对应于在每次迭代中求解一个凸优化问题。具体而言,我们探索了具有多种参数化形式的预条件梯度下降法,包括一种新颖的卷积预条件子。对于所学习的算法,我们证明了即使在预条件子既不对称也不正定的情况下,训练集上的收敛性仍然成立。同时,在一类未见函数上也获得了收敛性,确保了算法在训练数据之外的鲁棒性能与泛化能力。我们在两个逆问题(图像去模糊和计算机断层扫描)上测试了所学算法,结果表明:相较于Nesterov加速梯度法和拟牛顿法L-BFGS等经典优化算法,所学习的卷积预条件子展现出更优越的经验性能。