Learned optimizers are a crucial component of meta-learning. Recent advancements in scalable learned optimizers have demonstrated their superior performance over hand-designed optimizers in various tasks. However, certain characteristics of these models, such as an unstable learning curve, limited ability to handle unseen tasks and network architectures, difficult-to-control behaviours, and poor performance in fine-tuning tasks impede their widespread adoption. To tackle the issue of generalization in scalable learned optimizers, we propose a hybrid-update-based (HUB) optimization strategy inspired by recent advancements in hard prompt tuning and result selection techniques used in large language and vision models. This approach can be easily applied to any task that involves hand-designed or learned optimizer. By incorporating hand-designed optimizers as the second component in our hybrid approach, we are able to retain the benefits of learned optimizers while stabilizing the training process and, more importantly, improving testing performance. We validate our design through a total of 17 tasks, consisting of thirteen training from scratch and four fine-tuning settings. These tasks vary in model sizes, architectures, or dataset sizes, and the competing optimizers are hyperparameter-tuned. We outperform all competitors in 94% of the tasks with better testing performance. Furthermore, we conduct a theoretical analysis to examine the potential impact of our hybrid strategy on the behaviours and inherited traits of learned optimizers.
翻译:摘要:学习型优化器是元学习中的关键组成部分。近年来,可扩展学习型优化器的进展展示了其在多种任务中超越手工设计优化器的优异性能。然而,这类模型存在的某些特性,如不稳定的学习曲线、处理未见任务及网络架构的能力有限、行为难以控制以及微调任务中表现欠佳等问题,阻碍了其广泛采用。为解决可扩展学习型优化器的泛化性问题,我们借鉴大型语言与视觉模型中硬提示调优及结果选择技术的最新进展,提出了一种基于混合更新(HUB)的优化策略。该方法可轻松应用于任何涉及手工设计或学习型优化器的任务。通过将手工设计优化器作为混合方法的第二组件,我们既保留了学习型优化器的优势,又稳定了训练过程,更重要的是提升了测试性能。我们在总计17项任务(包括13项从头训练和4项微调设置)中验证了设计有效性。这些任务涉及不同模型规模、架构或数据集规模,且所有对比优化器均经过超参数调优。我们在94%的任务中以更优的测试性能击败了所有竞争对手。此外,我们进行了理论分析,探讨混合策略对学习型优化器行为及继承特性的潜在影响。