It is widely recognized that the generalization ability of neural networks can be greatly enhanced through carefully designing the training procedure. The current state-of-the-art training approach involves utilizing stochastic gradient descent (SGD) or Adam optimization algorithms along with a combination of additional regularization techniques such as weight decay, dropout, or noise injection. Optimal generalization can only be achieved by tuning a multitude of hyperparameters through grid search, which can be time-consuming and necessitates additional validation datasets. To address this issue, we introduce a practical PAC-Bayes training framework that is nearly tuning-free and requires no additional regularization while achieving comparable testing performance to that of SGD/Adam after a complete grid search and with extra regularizations. Our proposed algorithm demonstrates the remarkable potential of PAC training to achieve state-of-the-art performance on deep neural networks with enhanced robustness and interpretability.
翻译:众所周知,通过精心设计训练过程,神经网络的泛化能力可以得到显著提升。当前最先进的训练方法采用随机梯度下降或Adam优化算法,并结合权重衰减、Dropout或噪声注入等额外正则化技术。然而,只有通过网格搜索调整大量超参数才能实现最优泛化,这既耗时又需要额外的验证数据集。为解决这一问题,我们提出了一种实用的PAC-Bayes训练框架,该框架几乎无需调参,且无需额外正则化,即可在完全网格搜索并附加正则化的条件下达到与SGD/Adam相当的测试性能。我们提出的算法展示了PAC训练在深度神经网络上实现最先进性能的巨大潜力,同时具有更强的鲁棒性和可解释性。