We study the loss landscape of two-layer mildly overparameterized ReLU neural networks on a generic finite input dataset for the squared error loss. Our approach involves bounding the dimension of the sets of local and global minima using the rank of the Jacobian of the parameterization map. Using results on random binary matrices, we show most activation patterns correspond to parameter regions with no bad differentiable local minima. Furthermore, for one-dimensional input data, we show most activation regions realizable by the network contain a high dimensional set of global minima and no bad local minima. We experimentally confirm these results by finding a phase transition from most regions having full rank to many regions having deficient rank depending on the amount of overparameterization.
翻译:我们研究了两层适度过参数化ReLU神经网络在通用有限输入数据集上关于平方误差损失的损失景观。我们的方法通过参数化映射的雅可比矩阵的秩来界定局部和全局最小值集合的维度。利用随机二元矩阵的结果,我们证明大多数激活模式对应于不存在不良可微局部最小值的参数区域。此外,对于一维输入数据,我们证明网络可实现的大多数激活区域包含一个高维全局最小值集合,且不存在不良局部最小值。我们通过实验验证了这些结果,发现根据过参数化程度,存在一个从大多数区域满秩到许多区域秩不足的相变。