Deep artificial neural networks achieve surprising generalization abilities that remain poorly understood. In this paper, we present a new approach to analyzing generalization for deep feed-forward ReLU networks that takes advantage of the degree of sparsity that is achieved in the hidden layer activations. By developing a framework that accounts for this reduced effective model size for each input sample, we are able to show fundamental trade-offs between sparsity and generalization. Importantly, our results make no strong assumptions about the degree of sparsity achieved by the model, and it improves over recent norm-based approaches. We illustrate our results numerically, demonstrating non-vacuous bounds when coupled with data-dependent priors in specific settings, even in over-parametrized models.
翻译:深度人工神经网络展现出令人惊讶的泛化能力,但其内在机制仍未被充分理解。本文提出一种新的分析方法,通过利用前馈ReLU网络隐藏层激活值中实现的稀疏程度来研究其泛化性能。通过构建一个考虑每个输入样本有效模型尺寸缩减的框架,我们能够揭示稀疏性与泛化性之间的基本权衡关系。重要的是,我们的结果对模型所实现的稀疏程度不施加任何强假设,并且显著优于近期基于范数的分析方法。我们通过数值实验验证了理论结果,证明在特定场景下结合数据相关先验时——即使在过参数化模型中——也能获得非平凡的有效界。