A ReLU network is a piecewise linear function over polytopes. Figuring out the properties of such polytopes is of fundamental importance for the research and development of neural networks. So far, either theoretical or empirical studies on polytopes only stay at the level of counting their number, which is far from a complete characterization of polytopes. To upgrade the characterization to a new level, here we propose to study the shapes of polytopes via the number of simplices obtained by triangulating the polytope. Then, by computing and analyzing the histogram of simplices across polytopes, we find that a ReLU network has relatively simple polytopes under both initialization and gradient descent, although these polytopes theoretically can be rather diverse and complicated. This finding can be appreciated as a novel implicit bias. Next, we use nontrivial combinatorial derivation to theoretically explain why adding depth does not create a more complicated polytope by bounding the average number of faces of polytopes with a function of the dimensionality. Our results concretely reveal what kind of simple functions a network learns and its space partition property. Also, by characterizing the shape of polytopes, the number of simplices be a leverage for other problems, \textit{e.g.}, serving as a generic functional complexity measure to explain the power of popular shortcut networks such as ResNet and analyzing the impact of different regularization strategies on a network's space partition.
翻译:ReLU网络是一种定义在多面体上的分段线性函数。揭示这类多面体的性质对于神经网络的研究与发展具有基础性意义。目前,无论理论研究还是实证分析,对多面体的探讨仅限于计数层面,远未达到对其结构的完整刻画。为将表征提升至新高度,本文提出通过三角剖分后获得的单纯形数量来研究多面体的形状。通过计算并分析不同多面体间的单纯形直方图,我们发现:尽管理论上ReLU网络的多面体可能极为多样复杂,但在初始化和梯度下降过程中,其实际生成的多面体结构却相对简单。这一发现可被视为一种新型隐式偏置。进一步地,我们运用非平凡的组合推导,通过以维度函数约束多面体平均面数,从理论上解释了增加网络深度为何不会产生更复杂的多面体。研究结果具体揭示了网络所学习的简单函数类型及其空间划分性质。此外,通过刻画多面体形状,单纯形数量还可作为通用功能复杂度度量,用于解释ResNet等主流捷径网络的效能,并分析不同正则化策略对网络空间划分的影响。