Convex functions and their gradients play a critical role in mathematical imaging, from proximal optimization to Optimal Transport. The successes of deep learning has led many to use learning-based methods, where fixed functions or operators are replaced by learned neural networks. Regardless of their empirical superiority, establishing rigorous guarantees for these methods often requires to impose structural constraints on neural architectures, in particular convexity. The most popular way to do so is to use so-called Input Convex Neural Networks (ICNNs). In order to explore the expressivity of ICNNs, we provide necessary and sufficient conditions for a ReLU neural network to be convex. Such characterizations are based on product of weights and activations, and write nicely for any architecture in the path-lifting framework. As particular applications, we study our characterizations in depth for 1 and 2-hidden-layer neural networks: we show that every convex function implemented by a 1-hidden-layer ReLU network can be also expressed by an ICNN with the same architecture; however this property no longer holds with more layers. Finally, we provide a numerical procedure that allows an exact check of convexity for ReLU neural networks with a large number of affine regions.
翻译:凸函数及其梯度在数学成像中发挥着关键作用,从邻近优化到最优输运皆然。深度学习的成功促使许多研究者采用基于学习的方法,即用学习得到的神经网络替代固定的函数或算子。尽管这些方法在实证上表现优异,但为其建立严格的理论保证通常需要对神经架构施加结构性约束,特别是凸性约束。目前最流行的实现方式是使用所谓的输入凸神经网络(ICNNs)。为探究ICNNs的表达能力,本文给出了ReLU神经网络具有凸性的充分必要条件。该特征刻画基于权重与激活值的乘积运算,并可在路径提升框架中为任意架构提供简洁表述。作为具体应用,我们针对单隐层和双隐层神经网络进行了深入分析:证明所有由单隐层ReLU网络实现的凸函数均可由相同架构的ICNN表达;但该性质在更多层数时不再成立。最后,我们提出一种数值化方法,可对具有大量仿射区域的ReLU神经网络进行精确的凸性检验。