Learning to optimize is an approach that leverages training data to accelerate the solution of optimization problems. Many approaches use unrolling to parametrize the update step and learn optimal parameters. Although L2O has shown empirical advantages over classical optimization algorithms, memory restrictions often greatly limit the unroll length and learned algorithms usually do not provide convergence guarantees. In contrast, we introduce a novel method employing a greedy strategy that learns iteration-specific parameters by minimizing the function value at the next iteration. This enables training over significantly more iterations while maintaining constant GPU memory usage. We parameterize the update such that parameter learning corresponds to solving a convex optimization problem at each iteration. In particular, we explore preconditioned gradient descent with multiple parametrizations including a novel convolutional preconditioner. With our learned algorithm, convergence in the training set is proved even when the preconditioner is neither symmetric nor positive definite. Convergence on a class of unseen functions is also obtained, ensuring robust performance and generalization beyond the training data. We test our learned algorithms on two inverse problems, image deblurring and Computed Tomography, on which learned convolutional preconditioners demonstrate improved empirical performance over classical optimization algorithms such as Nesterov's Accelerated Gradient Method and the quasi-Newton method L-BFGS.
翻译:学习优化是一种利用训练数据加速优化问题求解的方法。许多方法通过展开操作参数化更新步骤并学习最优参数。尽管L2O在实证中展现出优于经典优化算法的优势,但内存限制常严重制约展开长度,且学习得到的算法通常无法提供收敛性保证。相比之下,我们提出一种采用贪心策略的新方法,通过最小化下一迭代点的函数值来学习迭代专用参数。这使得训练能在保持GPU内存使用恒定的同时,显著增加迭代次数。我们通过参数化更新步骤,使参数学习对应于每次迭代中求解一个凸优化问题。特别地,我们探索了具有多种参数化形式的预条件梯度下降法,包括一种新颖的卷积预条件子。对于所学习的算法,我们证明了即使预条件子既非对称也非正定,训练集上仍能保证收敛。同时在一类未见函数上也获得了收敛性,确保了算法在训练数据之外的鲁棒性能与泛化能力。我们在图像去模糊和计算机断层扫描两个逆问题上测试了学习算法,结果表明学习的卷积预条件子在实证性能上优于经典优化算法,如Nesterov加速梯度法和拟牛顿法L-BFGS。