We contribute towards resolving the open question of how many hidden layers are required in ReLU networks for exactly representing all continuous and piecewise linear functions on $\mathbb{R}^d$. While the question has been resolved in special cases, the best known lower bound in general is still 2. We focus on neural networks that are compatible with certain polyhedral complexes, more precisely with the braid fan. For such neural networks, we prove a non-constant lower bound of $\Omega(\log\log d)$ hidden layers required to exactly represent the maximum of $d$ numbers. Additionally, under our assumption, we provide a combinatorial proof that 3 hidden layers are necessary to compute the maximum of 5 numbers; this had only been verified with an excessive computation so far. Finally, we show that a natural generalization of the best known upper bound to maxout networks is not tight, by demonstrating that a rank-3 maxout layer followed by a rank-2 maxout layer is sufficient to represent the maximum of 7 numbers.
翻译:我们针对ReLU网络精确表示$\mathbb{R}^d$上所有连续分段线性函数所需隐藏层数的开放问题作出贡献。尽管该问题在特殊情况下已得到解决,但最著名的通用下界仍为2层。我们聚焦于与特定多面体复形(更精确地说是辫状扇)兼容的神经网络。对此类网络,我们证明了精确表示$d$个数最大值所需的隐藏层数具有非常数下界$\Omega(\log\log d)$。此外,在我们的假设下,我们通过组合证明计算5个数最大值需要3个隐藏层——此前该结论仅通过大量计算得以验证。最后,我们通过证明一个秩3的maxout层后接秩2的maxout层足以表示7个数最大值,说明将当前最佳上界推广至maxout网络的自然方法并不紧致。