Threshold activation functions are highly preferable in neural networks due to their efficiency in hardware implementations. Moreover, their mode of operation is more interpretable and resembles that of biological neurons. However, traditional gradient based algorithms such as Gradient Descent cannot be used to train the parameters of neural networks with threshold activations since the activation function has zero gradient except at a single non-differentiable point. To this end, we study weight decay regularized training problems of deep neural networks with threshold activations. We first show that regularized deep threshold network training problems can be equivalently formulated as a standard convex optimization problem, which parallels the LASSO method, provided that the last hidden layer width exceeds a certain threshold. We also derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network. We corroborate our theoretical results with various numerical experiments.
翻译:阈值激活函数因其在硬件实现中的高效性而备受神经网络青睐。此外,其运行模式更具可解释性,且类似于生物神经元的工作方式。然而,由于该激活函数在除单个不可微点外梯度为零,传统的基于梯度的算法(如梯度下降法)无法用于训练具有阈值激活的神经网络参数。为此,我们研究了带权重衰减正则化的深度神经网络阈值激活训练问题。首先,我们证明当最后一个隐藏层宽度超过特定阈值时,正则化的深度阈值网络训练问题可等价转化为标准凸优化问题,这与LASSO方法相平行。此外,当数据集能够在网络的某一层被分割时,我们推导出简化的凸优化公式。我们通过多项数值实验验证了理论结果。