We contribute to a better understanding of the class of functions that can be represented by a neural network with ReLU activations and a given architecture. Using techniques from mixed-integer optimization, polyhedral theory, and tropical geometry, we provide a mathematical counterbalance to the universal approximation theorems which suggest that a single hidden layer is sufficient for learning any function. In particular, we investigate whether the class of exactly representable functions strictly increases by adding more layers (with no restrictions on size). As a by-product of our investigations, we settle an old conjecture about piecewise linear functions by Wang and Sun (2005) in the affirmative. We also present upper bounds on the sizes of neural networks required to represent functions with logarithmic depth.
翻译:本文旨在深化对具有ReLU激活函数及给定架构的神经网络所能表示的函数类别的理解。通过融合混合整数优化、多面体理论和热带几何等方法,我们为"单隐层即可学习任意函数"的通用逼近定理提供了数学层面的制衡视角。具体而言,我们研究了在放宽规模限制的条件下,增加网络层数是否能够严格扩展神经网络的精确表示函数类。作为研究的副产品,我们以肯定结论解决了Wang与Sun(2005)关于分段线性函数的一个古老猜想。同时,我们给出了以对数深度表示函数时所需神经网络规模的上界估计。