Deep neural networks have significantly alleviated the burden of feature engineering, but comparable efforts are now required to determine effective architectures for these networks. Furthermore, as network sizes have become excessively large, a substantial amount of resources is invested in reducing their sizes. These challenges can be effectively addressed through the sparsification of over-complete models. In this study, we propose a fully differentiable sparsification method for deep neural networks, which can zero out unimportant parameters by directly optimizing a regularized objective function with stochastic gradient descent. Consequently, the proposed method can learn both the sparsified structure and weights of a network in an end-to-end manner. It can be directly applied to various modern deep neural networks and requires minimal modification to the training process. To the best of our knowledge, this is the first fully differentiable sparsification method.
翻译:深度神经网络显著减轻了特征工程负担,但确定这些网络的有效架构现在需要投入相当多的精力。此外,随着网络规模变得过大,大量资源被用于减小网络尺寸。这些问题可以通过对过完备模型进行稀疏化来有效解决。本研究提出了一种完全可微的深度神经网络稀疏化方法,该方法通过直接利用随机梯度下降优化正则化目标函数,能够将不重要的参数归零。因此,所提方法能够以端到端的方式同时学习网络的稀疏化结构和权重。该方法可直接应用于各类现代深度神经网络,且训练过程几乎无需修改。据我们所知,这是首个完全可微的稀疏化方法。