We introduce Tune without Validation (Twin), a pipeline for tuning learning rate and weight decay without validation sets. We leverage a recent theoretical framework concerning learning phases in hypothesis space to devise a heuristic that predicts what hyper-parameter (HP) combinations yield better generalization. Twin performs a grid search of trials according to an early-/non-early-stopping scheduler and then segments the region that provides the best results in terms of training loss. Among these trials, the weight norm strongly correlates with predicting generalization. To assess the effectiveness of Twin, we run extensive experiments on 20 image classification datasets and train several families of deep networks, including convolutional, transformer, and feed-forward models. We demonstrate proper HP selection when training from scratch and fine-tuning, emphasizing small-sample scenarios.
翻译:我们提出了“无需验证的调优”(Twin)流水线,用于在没有验证集的情况下调整学习率和权重衰减。借助关于假设空间中学习阶段的最新理论框架,我们设计了一种启发式方法,用于预测哪些超参数(HP)组合能带来更好的泛化性能。Twin 根据早停/非早停调度器执行网格搜索试验,然后划分出在训练损失方面表现最佳的区域。在这些试验中,权重范数与泛化性能的预测高度相关。为评估 Twin 的有效性,我们在 20 个图像分类数据集上进行了广泛实验,并训练了包括卷积、Transformer 和前馈模型在内的多个深度网络系列。我们展示了在从零开始训练和微调时(尤其是小样本场景下)正确的超参数选择方法。