Each year, deep learning demonstrates new and improved empirical results with deeper and wider neural networks. Meanwhile, with existing theoretical frameworks, it is difficult to analyze networks deeper than two layers without resorting to counting parameters or encountering sample complexity bounds that are exponential in depth. Perhaps it may be fruitful to try to analyze modern machine learning under a different lens. In this paper, we propose a novel information-theoretic framework with its own notions of regret and sample complexity for analyzing the data requirements of machine learning. With our framework, we first work through some classical examples such as scalar estimation and linear regression to build intuition and introduce general techniques. Then, we use the framework to study the sample complexity of learning from data generated by deep neural networks with ReLU activation units. For a particular prior distribution on weights, we establish sample complexity bounds that are simultaneously width independent and linear in depth. This prior distribution gives rise to high-dimensional latent representations that, with high probability, admit reasonably accurate low-dimensional approximations. We conclude by corroborating our theoretical results with experimental analysis of random single-hidden-layer neural networks.
翻译:每年,深度学习凭借更深更宽的神经网络展现出更为优异的实证成果。然而,在现有理论框架下,分析超过两层深度的网络时,若不依赖参数计数或遭遇随深度呈指数增长的样本复杂度下界,将极其困难。或许,从不同视角审视现代机器学习将有所裨益。本文提出一种全新的信息论框架,该框架拥有自身的遗憾值与样本复杂度概念,用于分析机器学习的数据需求。借助该框架,我们首先通过标量估计与线性回归等经典范例建立直觉,并引入通用技术。随后,利用该框架研究由ReLU激活单元深度神经网络生成数据的学习样本复杂度。针对特定权重先验分布,我们建立了同时具有宽度无关性与深度线性关系的样本复杂度下界。该先验分布催生出高维隐层表示,这些表示以高概率拥有相当准确的低维近似。最后,我们通过对随机单隐层神经网络的实验分析,验证了理论结果。