We consider functions from the real numbers to the real numbers, output by a neural network with 1 hidden activation layer, arbitrary width, and ReLU activation function. We assume that the parameters of the neural network are chosen uniformly at random with respect to various probability distributions, and compute the expected distribution of the points of non-linearity. We use these results to explain why the network may be biased towards outputting functions with simpler geometry, and why certain functions with low information-theoretic complexity are nonetheless hard for a neural network to approximate.
翻译:我们考虑由具有1个隐藏激活层、任意宽度和ReLU激活函数的神经网络输出的实数到实数的函数。假设神经网络的参数是依据各种概率分布均匀随机选取的,并计算非线性点的期望分布。我们利用这些结果来解释网络为何可能偏向输出具有更简单几何结构的函数,以及为何某些信息论复杂度较低的函数对神经网络而言仍难以逼近。