Learning to Generalize Provably in Learning to Optimize

Learning to optimize (L2O) has gained increasing popularity, which automates the design of optimizers by data-driven approaches. However, current L2O methods often suffer from poor generalization performance in at least two folds: (i) applying the L2O-learned optimizer to unseen optimizees, in terms of lowering their loss function values (optimizer generalization, or ``generalizable learning of optimizers"); and (ii) the test performance of an optimizee (itself as a machine learning model), trained by the optimizer, in terms of the accuracy over unseen data (optimizee generalization, or ``learning to generalize"). While the optimizer generalization has been recently studied, the optimizee generalization (or learning to generalize) has not been rigorously studied in the L2O context, which is the aim of this paper. We first theoretically establish an implicit connection between the local entropy and the Hessian, and hence unify their roles in the handcrafted design of generalizable optimizers as equivalent metrics of the landscape flatness of loss functions. We then propose to incorporate these two metrics as flatness-aware regularizers into the L2O framework in order to meta-train optimizers to learn to generalize, and theoretically show that such generalization ability can be learned during the L2O meta-training process and then transformed to the optimizee loss function. Extensive experiments consistently validate the effectiveness of our proposals with substantially improved generalization on multiple sophisticated L2O models and diverse optimizees. Our code is available at: https://github.com/VITA-Group/Open-L2O/tree/main/Model_Free_L2O/L2O-Entropy.

翻译：学习优化（L2O）日益流行，它通过数据驱动方法自动设计优化器。然而，当前的L2O方法通常至少在两个方面存在泛化性能不佳的问题：（i）将L2O学习到的优化器应用于未见过的被优化对象，在降低其损失函数值方面（优化器泛化，或“优化器的可泛化学习”）；以及（ii）被优化对象（本身作为机器学习模型）在优化器训练下的测试性能，即对未见数据的准确性（被优化对象泛化，或“学习泛化”）。尽管优化器泛化最近已得到研究，但在L2O语境下，被优化对象泛化（即学习泛化）尚未得到严格研究，这正是本文的目标。我们首先从理论上建立了局部熵与海森矩阵之间的隐式联系，从而将其在手工程设计的可泛化优化器中的作用统一为损失函数景观平坦度的等价度量。然后，我们提出将这两种度量作为平坦感知正则化项纳入L2O框架，以元训练优化器学习泛化能力，并从理论上证明这种泛化能力可以在L2O元训练过程中习得，并迁移至被优化对象的损失函数。大量实验一致验证了我们提出的方法的有效性，在多个复杂的L2O模型和多样化的被优化对象上显著提升了泛化性能。我们的代码开源地址为：https://github.com/VITA-Group/Open-L2O/tree/main/Model_Free_L2O/L2O-Entropy。