Modern machine learning usually involves predictors in the overparameterised setting (number of trained parameters greater than dataset size), and their training yields not only good performance on training data, but also good generalisation capacity. This phenomenon challenges many theoretical results, and remains an open problem. To reach a better understanding, we provide novel generalisation bounds involving gradient terms. To do so, we combine the PAC-Bayes toolbox with Poincar\'e and Log-Sobolev inequalities, avoiding an explicit dependency on the dimension of the predictor space. Our results highlight the positive influence of flat minima (being minima with a neighbourhood nearly minimising the learning problem as well) on generalisation performance, involving directly the benefits of the optimisation phase.
翻译:现代机器学习通常涉及在过参数化场景(训练参数量大于数据集规模)下的预测器,其训练过程不仅能在训练数据上获得良好性能,还展现出优异的泛化能力。这一现象对许多理论结果提出了挑战,至今仍是未解难题。为深化理解,我们提出了包含梯度项的新型泛化界。为此,我们将PAC-Bayes分析工具与庞加莱不等式及对数索伯列夫不等式相结合,避免了对预测器空间维度的显式依赖。研究结果凸显了平坦极小值(即其邻域同样能近似最小化学习问题的极小值)对泛化性能的积极影响,并直接揭示了优化阶段带来的收益。