Learning to optimize is an approach that leverages training data to accelerate the solution of optimization problems. Many approaches use unrolling to parametrize the update step and learn optimal parameters. Although L2O has shown empirical advantages over classical optimization algorithms, memory restrictions often greatly limit the unroll length and learned algorithms usually do not provide convergence guarantees. In contrast, we introduce a novel method employing a greedy strategy that learns iteration-specific parameters by minimizing the function value at the next iteration. This enables training over significantly more iterations while maintaining constant memory usage. We parameterize the update such that parameter learning corresponds to solving a convex optimization problem at each iteration. In particular, we explore preconditioned gradient descent with multiple parametrizations including a novel convolutional preconditioner. With our learned algorithm, convergence in the training set is proven even when the preconditioner is neither symmetric nor positive definite. Convergence on a class of unseen functions is also obtained, ensuring robust performance and generalization beyond the training data. We test our learned algorithms on two inverse problems, image deblurring and Computed Tomography, on which learned convolutional preconditioner demonstrates improved empirical performance over classical optimization algorithms such as Nesterov's Accelerated Gradient Method and the quasi-Newton method L-BFGS.
翻译:学习优化是一种利用训练数据加速求解优化问题的方法。许多方法通过展开技术参数化更新步骤并学习最优参数。尽管学习优化在实证上已显示出优于经典优化算法的优势,但内存限制通常严重限制了展开长度,且所学算法通常无法提供收敛性保证。相比之下,我们引入了一种采用贪婪策略的新方法,该方法通过最小化下一迭代步的函数值来学习迭代特定的参数。这使得训练能够在显著增加迭代次数的同时保持恒定的内存使用量。我们通过参数化更新过程,使参数学习对应于在每次迭代中求解一个凸优化问题。特别地,我们探索了具有多种参数化形式的预条件梯度下降法,包括一种新颖的卷积预条件子。利用我们学习的算法,即使预条件子既不对称也不正定,也能证明其在训练集上的收敛性。同时,在一类未见函数上也获得了收敛性,确保了算法在训练数据之外的鲁棒性能与泛化能力。我们在两个逆问题(图像去模糊和计算机断层扫描)上测试了所学算法,其中学习的卷积预条件子在实证性能上优于经典优化算法,如Nesterov加速梯度法和拟牛顿法L-BFGS。