We contribute to a better understanding of the class of functions that can be represented by a neural network with ReLU activations and a given architecture. Using techniques from mixed-integer optimization, polyhedral theory, and tropical geometry, we provide a mathematical counterbalance to the universal approximation theorems which suggest that a single hidden layer is sufficient for learning any function. In particular, we investigate whether the class of exactly representable functions strictly increases by adding more layers (with no restrictions on size). As a by-product of our investigations, we settle an old conjecture about piecewise linear functions by Wang and Sun (2005) in the affirmative. We also present upper bounds on the sizes of neural networks required to represent functions with logarithmic depth.
翻译:我们有助于更深入地理解可由具有ReLU激活函数和给定架构的神经网络表示的函数类。利用混合整数优化、多面体理论和热带几何学的方法,我们从数学上对通用逼近定理(该定理表明单隐藏层足以学习任何函数)提供了平衡性观点。特别是,我们研究了添加更多层(对规模无限制)是否严格增加可精确表示的函数类。作为研究的附带成果,我们肯定性地解决了Wang和Sun(2005年)关于分段线性函数的一个旧猜想。我们还提出了表示具有对数深度的函数所需的神经网络规模的上界。