To advance deep learning methodologies in the next decade, a theoretical framework for reasoning about modern neural networks is needed. While efforts are increasing toward demystifying why deep learning is so effective, a comprehensive picture remains lacking, suggesting that a better theory is possible. We argue that a future deep learning theory should inherit three characteristics: a \textit{hierarchically} structured network architecture, parameters \textit{iteratively} optimized using stochastic gradient-based methods, and information from the data that evolves \textit{compressively}. As an instantiation, we integrate these characteristics into a graphical model called \textit{neurashed}. This model effectively explains some common empirical patterns in deep learning. In particular, neurashed enables insights into implicit regularization, information bottleneck, and local elasticity. Finally, we discuss how neurashed can guide the development of deep learning theories.
翻译:为推动未来十年深度学习方法的进步,需要建立能够推理现代神经网络的理论框架。尽管学界正日益致力于揭示深度学习高效性的内在机理,但整体图景仍不完整,这表明构建更完善的理论具有可能性。我们认为,未来的深度学习理论应继承三个核心特征:采用\textit{层级化}结构的网络架构、通过基于随机梯度的\textit{迭代}方法优化参数,以及数据信息\textit{压缩式}的演化过程。作为具体实例,我们将这些特征整合至名为\textit{neurashed}的图模型中。该模型能有效解释深度学习中常见的若干经验性规律,尤其为理解隐式正则化、信息瓶颈及局部弹性等现象提供了理论视角。最后,我们探讨了neurashed如何为深度学习理论的未来发展提供指引。