Well-tuned hyperparameters are crucial for obtaining good generalization behavior in neural networks. They can enforce appropriate inductive biases, regularize the model and improve performance -- especially in the presence of limited data. In this work, we propose a simple and efficient way for optimizing hyperparameters inspired by the marginal likelihood, an optimization objective that requires no validation data. Our method partitions the training data and a neural network model into $K$ data shards and parameter partitions, respectively. Each partition is associated with and optimized only on specific data shards. Combining these partitions into subnetworks allows us to define the ``out-of-training-sample" loss of a subnetwork, i.e., the loss on data shards unseen by the subnetwork, as the objective for hyperparameter optimization. We demonstrate that we can apply this objective to optimize a variety of different hyperparameters in a single training run while being significantly computationally cheaper than alternative methods aiming to optimize the marginal likelihood for neural networks. Lastly, we also focus on optimizing hyperparameters in federated learning, where retraining and cross-validation are particularly challenging.
翻译:精心调优的超参数对于神经网络获得良好的泛化行为至关重要。它们能够强制施加适当的归纳偏置、正则化模型并提升性能——尤其是在数据有限的情况下。本文提出一种受边际似然启发的高效超参数优化方法,该优化目标无需验证数据。该方法将训练数据和神经网络模型分别划分为$K$个数据分片和参数分区。每个分区仅与特定数据分片相关联并仅在其上进行优化。通过将子网络中的参数分区组合,我们能够定义子网络的"训练外样本"损失(即子网络未见过的数据分片上的损失),并将其作为超参数优化的目标函数。实验证明,该目标可在单次训练过程中优化多种不同超参数,且计算成本显著低于其他旨在优化神经网络边际似然的方法。最后,我们还重点研究了联邦学习中的超参数优化问题——在该场景下,重训练和交叉验证尤为困难。