It is widely recognized that the generalization ability of neural networks can be greatly enhanced through carefully designing the training procedure. The current state-of-the-art training approach involves utilizing stochastic gradient descent (SGD) or Adam optimization algorithms along with a combination of additional regularization techniques such as weight decay, dropout, or noise injection. Optimal generalization can only be achieved by tuning a multitude of hyperparameters through grid search, which can be time-consuming and necessitates additional validation datasets. To address this issue, we introduce a practical PAC-Bayes training framework that is nearly tuning-free and requires no additional regularization while achieving comparable testing performance to that of SGD/Adam after a complete grid search and with extra regularizations. Our proposed algorithm demonstrates the remarkable potential of PAC training to achieve state-of-the-art performance on deep neural networks with enhanced robustness and interpretability.
翻译:广泛认为,精心设计训练过程能显著增强神经网络的泛化能力。当前最先进的训练方法涉及使用随机梯度下降(SGD)或Adam优化算法,并辅以权重衰减、dropout或噪声注入等额外正则化技术。只有通过网格搜索调优大量超参数,才能达到最优泛化性能,但这既耗时又需要额外的验证数据集。为解决此问题,我们提出了一种近乎无需调优的实用PAC-Bayes训练框架,该框架无需额外正则化,即可在完成完整网格搜索且使用额外正则化的SGD/Adam方法上实现可比的测试性能。我们提出的算法展示了PAC训练在深度神经网络上实现最先进性能的巨大潜力,并具有更强的鲁棒性和可解释性。