Convolutional neural networks are widely used in imaging and image recognition. Learning such networks from training data leads to the minimization of a non-convex function. This makes the analysis of standard optimization methods such as variants of (stochastic) gradient descent challenging. In this article we study the simplified setting of linear convolutional networks. We show that the gradient flow (to be interpreted as an abstraction of gradient descent) applied to the empirical risk defined via certain loss functions including the square loss always converges to a critical point, under a mild condition on the training data.
翻译:卷积神经网络广泛应用于成像和图像识别领域。从训练数据中学习此类网络可归结为非凸函数的最小化问题,这使得分析(随机)梯度下降等标准优化方法具有挑战性。本文研究线性卷积网络的简化设定。我们证明,在训练数据满足温和条件下,应用于通过特定损失函数(包括平方损失)定义的经验风险的梯度流(可解释为梯度下降的抽象模型)始终收敛至临界点。