Standard regularized training procedures correspond to maximizing a posterior distribution over parameters, known as maximum a posteriori (MAP) estimation. However, model parameters are of interest only insomuch as they combine with the functional form of a model to provide a function that can make good predictions. Moreover, the most likely parameters under the parameter posterior do not generally correspond to the most likely function induced by the parameter posterior. In fact, we can re-parametrize a model such that any setting of parameters can maximize the parameter posterior. As an alternative, we investigate the benefits and drawbacks of directly estimating the most likely function implied by the model and the data. We show that this procedure leads to pathological solutions when using neural networks and prove conditions under which the procedure is well-behaved, as well as a scalable approximation. Under these conditions, we find that function-space MAP estimation can lead to flatter minima, better generalization, and improved robustness to overfitting.
翻译:标准正则化训练过程对应着最大化参数的后验分布,即最大后验(MAP)估计。然而,模型参数之所以重要,仅在于它们与模型函数形式结合后能提供可做出良好预测的函数。此外,参数后验下的最可能参数通常并不对应参数后验所诱导的最可能函数。事实上,我们可以重新参数化模型,使得任何参数设置都能最大化参数后验。作为替代方案,我们研究了直接估计模型与数据所隐含的最可能函数的优缺点。我们发现,当使用神经网络时,该过程会导致病态解,并证明了该过程表现良好的条件,以及一种可扩展的近似方法。在这些条件下,我们发现函数空间中的MAP估计可以导致更平坦的极小值、更好的泛化能力,并提升对过拟合的鲁棒性。