Given a training set, a loss function, and a neural network architecture, it is often taken for granted that optimal network parameters exist, and a common practice is to apply available optimization algorithms to search for them. In this work, we show that the existence of an optimal solution is not always guaranteed, especially in the context of {\em sparse} ReLU neural networks. In particular, we first show that optimization problems involving deep networks with certain sparsity patterns do not always have optimal parameters, and that optimization algorithms may then diverge. Via a new topological relation between sparse ReLU neural networks and their linear counterparts, we derive -- using existing tools from real algebraic geometry -- an algorithm to verify that a given sparsity pattern suffers from this issue. Then, the existence of a global optimum is proved for every concrete optimization problem involving a shallow sparse ReLU neural network of output dimension one. Overall, the analysis is based on the investigation of two topological properties of the space of functions implementable as sparse ReLU neural networks: a best approximation property, and a closedness property, both in the uniform norm. This is studied both for (finite) domains corresponding to practical training on finite training sets, and for more general domains such as the unit cube. This allows us to provide conditions for the guaranteed existence of an optimum given a sparsity pattern. The results apply not only to several sparsity patterns proposed in recent works on network pruning/sparsification, but also to classical dense neural networks, including architectures not covered by existing results.
翻译:给定训练集、损失函数和神经网络架构,人们通常默认最优网络参数存在,并常直接应用现有优化算法进行搜索。本文证明最优解的存在性并非总有保障,尤其在稀疏ReLU神经网络情境下。具体而言,我们首先表明,具有特定稀疏模式的深度网络优化问题并不总是存在最优参数,且优化算法可能因此发散。通过建立稀疏ReLU神经网络与其线性对应网络之间的新拓扑关系,我们利用实代数几何的现有工具推导出一种算法,可验证给定稀疏模式是否面临此问题。随后,我们证明了每个涉及输出维度为一的浅层稀疏ReLU神经网络的具体优化问题均存在全局最优解。整体分析基于对作为稀疏ReLU神经网络可实现的函数空间的两个拓扑性质的研究:最佳逼近性质和闭性性质,两者均在一致范数下定义。我们针对有限训练集上的实际训练所对应的有限域,以及更一般的域(如单位立方体)分别进行考察。这使我们能够给出在给定稀疏模式下保证最优解存在的条件。该结果不仅适用于近期网络剪枝/稀疏化研究中提出的多种稀疏模式,也涵盖经典密集神经网络,包括现有结果未覆盖的架构。