Monotonic linear interpolation (MLI) - on the line connecting a random initialization with the minimizer it converges to, the loss and accuracy are monotonic - is a phenomenon that is commonly observed in the training of neural networks. Such a phenomenon may seem to suggest that optimization of neural networks is easy. In this paper, we show that the MLI property is not necessarily related to the hardness of optimization problems, and empirical observations on MLI for deep neural networks depend heavily on biases. In particular, we show that interpolating both weights and biases linearly leads to very different influences on the final output, and when different classes have different last-layer biases on a deep network, there will be a long plateau in both the loss and accuracy interpolation (which existing theory of MLI cannot explain). We also show how the last-layer biases for different classes can be different even on a perfectly balanced dataset using a simple model. Empirically we demonstrate that similar intuitions hold on practical networks and realistic datasets.
翻译:单调线性插值——在连接随机初始化与其收敛到的极小值点的直线上,损失和准确率呈现单调性——是神经网络训练中普遍观察到的现象。该现象似乎表明神经网络的优化是容易的。本文证明MLI性质与优化问题的难度未必相关,且深度神经网络MLI的经验观测结果高度依赖于偏置。具体而言,我们证明线性插值权重和偏置对最终输出产生截然不同的影响,当不同类别在深度网络最后一层具有不同偏置时,损失和准确率插值曲线将出现长平台(现有MLI理论无法解释此现象)。我们还通过简单模型证明了即使在完全平衡的数据集上,不同类别的最后一层偏置仍可能不同。实验表明,类似直觉在实用网络和真实数据集上同样成立。