This paper provides theoretical insights into why and how deep learning can generalize well, despite its large capacity, complexity, possible algorithmic instability, nonrobustness, and sharp minima, responding to an open question in the literature. We also discuss approaches to provide non-vacuous generalization guarantees for deep learning. Based on theoretical observations, we propose new open problems and discuss the limitations of our results.
翻译:本文从理论角度深入探讨了深度学习为何及如何在具有大容量、高复杂性、可能的算法不稳定性、非鲁棒性以及尖锐极小值的情况下仍能良好泛化,回应了文献中的一个开放性问题。我们还讨论了为深度学习提供非平凡泛化保证的方法。基于理论观察,我们提出了新的开放性问题,并讨论了研究结果的局限性。