Difference-of-Convex Algorithm (DCA) is a well-known nonconvex optimization algorithm for minimizing a nonconvex function that can be expressed as the difference of two convex ones. Many famous existing optimization algorithms, such as SGD and proximal point methods, can be viewed as special DCAs with specific DC decompositions, making it a powerful framework for optimization. On the other hand, shortcuts are a key architectural feature in modern deep neural networks, facilitating both training and optimization. We showed that the shortcut neural network gradient can be obtained by applying DCA to vanilla neural networks, networks without shortcut connections. Therefore, from the perspective of DCA, we can better understand the effectiveness of networks with shortcuts. Moreover, we proposed a new architecture called NegNet that does not fit the previous interpretation but performs on par with ResNet and can be included in the DCA framework.
翻译:差分凸规划算法(DCA)是一种著名的非凸优化算法,用于最小化可表示为两个凸函数之差的非凸函数。许多现有的著名优化算法,例如随机梯度下降法和邻近点方法,可视为具有特定DC分解的DCA特例,这使其成为优化的强大框架。另一方面,捷径连接是现代深度神经网络的关键架构特征,有助于训练和优化。我们证明了可通过将DCA应用于普通神经网络(即无捷径连接的神经网络)来获得带捷径连接的神经网络梯度。因此,从DCA的视角出发,我们可以更好地理解带捷径连接网络的有效性。此外,我们提出了一种名为NegNet的新架构,该架构虽不符合先前的解释,但其性能与ResNet相当,并可纳入DCA框架中。