When training deep learning models, the performance depends largely on the selected hyperparameters. However, hyperparameter optimization (HPO) is often one of the most expensive parts of model design. Classical HPO methods treat this as a black-box optimization problem. However, gray-box HPO methods, which incorporate more information about the setup, have emerged as a promising direction for more efficient optimization. For example, using intermediate loss evaluations to terminate bad selections. In this work, we propose an HPO method for neural networks using logged checkpoints of the trained weights to guide future hyperparameter selections. Our method, Forecasting Model Search (FMS), embeds weights into a Gaussian process deep kernel surrogate model, using a permutation-invariant graph metanetwork to be data-efficient with the logged network weights. To facilitate reproducibility and further research, we open-source our code at https://github.com/NVlabs/forecasting-model-search.
翻译:在训练深度学习模型时,性能很大程度上取决于所选的超参数。然而,超参数优化(HPO)通常是模型设计中最耗时的环节之一。经典的HPO方法将其视为黑盒优化问题。而灰盒HPO方法通过融入更多系统设置信息,已成为实现更高效优化的有前景方向,例如利用中间损失评估来提前终止不良参数选择。本研究提出一种针对神经网络的HPO方法,通过记录训练过程中的权重检查点来指导后续超参数选择。我们提出的预测模型搜索(FMS)方法将权重嵌入高斯过程深度核代理模型中,并采用置换不变图元网络实现对已记录网络权重的数据高效利用。为促进可复现性与后续研究,我们在https://github.com/NVlabs/forecasting-model-search开源了相关代码。