Learned optimizers are a crucial component of meta-learning. Recent advancements in scalable learned optimizers have demonstrated their superior performance over hand-designed optimizers in various tasks. However, certain characteristics of these models, such as an unstable learning curve, limited ability to handle unseen tasks and network architectures, difficult-to-control behaviours, and poor performance in fine-tuning tasks impede their widespread adoption. To tackle the issue of generalization in scalable learned optimizers, we propose a hybrid-update-based (HUB) optimization strategy inspired by recent advancements in hard prompt tuning and result selection techniques used in large language and vision models. This approach can be easily applied to any task that involves hand-designed or learned optimizer. By incorporating hand-designed optimizers as the second component in our hybrid approach, we are able to retain the benefits of learned optimizers while stabilizing the training process and, more importantly, improving testing performance. We validate our design through a total of 17 tasks, consisting of thirteen training from scratch and four fine-tuning settings. These tasks vary in model sizes, architectures, or dataset sizes, and the competing optimizers are hyperparameter-tuned. We outperform all competitors in 94% of the tasks with better testing performance. Furthermore, we conduct a theoretical analysis to examine the potential impact of our hybrid strategy on the behaviours and inherited traits of learned optimizers.
翻译:学习式优化器是元学习的关键组成部分。近期可扩展学习式优化器的进展表明,其在多种任务中性能优于人工设计的优化器。然而,这类模型存在的某些特性——如不稳定的学习曲线、处理未见任务及网络架构的能力有限、行为难以调控、以及微调任务中表现不佳——阻碍了其广泛应用。为解决可扩展学习式优化器的泛化问题,受大语言模型与视觉模型中硬提示调优及结果选择技术的最新进展启发,我们提出一种基于混合更新(HUB)的优化策略。该方法可轻松应用于任何涉及人工设计或学习式优化器的任务。通过将人工设计优化器作为混合策略的第二组件,我们既能保留学习式优化器的优势,又可以稳定训练过程,更重要的是提升测试性能。我们通过17个任务(包括13个从头训练和4个微调场景)验证了设计有效性。这些任务涵盖不同模型规模、架构或数据集大小,且所有对比优化器均经超参数调优。我们在94%的任务中取得了优于所有竞争方法的测试性能。此外,我们进行了理论分析,以探究混合策略对学习式优化器行为与继承特性的潜在影响。