A ReLU network is a piecewise linear function over polytopes. Figuring out the properties of such polytopes is of fundamental importance for the research and development of neural networks. So far, either theoretical or empirical studies on polytopes only stay at the level of counting their number, which is far from a complete characterization. Here, we propose to study the shapes of polytopes via the number of faces of the polytope. Then, by computing and analyzing the histogram of faces across polytopes, we find that a ReLU network has relatively simple polytopes under both initialization and gradient descent, although these polytopes can be rather diverse and complicated by a specific design. This finding can be appreciated as a kind of generalized implicit bias, subjected to the intrinsic geometric constraint in space partition of a ReLU network. Next, we perform a combinatorial analysis to explain why adding depth does not generate a more complicated polytope by bounding the average number of faces of polytopes with the dimensionality. Our results concretely reveal what kind of simple functions a network learns and what will happen when a network goes deep. Also, by characterizing the shape of polytopes, the number of faces can be a novel leverage for other problems, \textit{e.g.}, serving as a generic tool to explain the power of popular shortcut networks such as ResNet and analyzing the impact of different regularization strategies on a network's space partition.
翻译:ReLU网络是一种定义在多面体上的分段线性函数。理解此类多面体的性质对于神经网络的研究与发展具有根本重要性。迄今为止,针对多面体的理论或实证研究仅停留在统计其数量的层面,这远未形成完整的特征描述。本文提出通过多面体的面数来研究其几何形状。通过计算并分析多面体间的面数分布直方图,我们发现:尽管通过特定设计可能产生相当多样且复杂的多面体,但在参数初始化和梯度下降过程中,ReLU网络形成的多面体结构相对简单。这一发现可视为一种广义的隐式偏置,它源于ReLU网络空间划分中固有的几何约束。进一步,我们通过组合数学分析解释了为何增加网络深度不会产生更复杂的多面体——通过维度限制给出了多面体平均面数的上界。我们的结果具体揭示了网络学习的是何种简单函数,以及网络加深时会发生何种变化。此外,通过刻画多面体形状,面数可作为解决其他问题的新杠杆,例如:作为解释ResNet等主流快捷网络效能的通用工具,以及分析不同正则化策略对网络空间划分的影响。