The intermediate layers of deep networks can be characterised as a Gaussian process, in particular the Edge-of-Chaos (EoC) initialisation strategy prescribes the limiting covariance matrix of the Gaussian process. Here we show that the under-utilised chosen variance of the Gaussian process is important in the training of deep networks with sparsity inducing activation, such as a shifted and clipped ReLU, $\text{CReLU}_{τ,m}(x)=\min(\max(x-τ,0),m)$. Specifically, initialisations leading to larger fixed Gaussian process variances, allow for improved expressivity with activation sparsity as large as 90% in DNNs and CNNs, and generally improve the stability of the training process. Enabling full, or near full, accuracy at such high levels of sparsity in the hidden layers suggests a promising mechanism to reduce the energy consumption of machine learning models involving fully connected layers.
翻译:深度神经网络的中间层可表征为高斯过程,特别是边缘混沌初始化策略规定了该高斯过程极限协方差矩阵的构建准则。本文研究表明,在高斯过程中常被忽视的选定方差参数对于采用稀疏诱导激活函数(如平移截断ReLU,$\text{CReLU}_{τ,m}(x)=\min(\max(x-τ,0),m)$)的深度网络训练至关重要。具体而言,能产生更大固定高斯过程方差的初始化方法,可在深度神经网络与卷积神经网络中实现高达90%的激活稀疏度并保持优异的表达能力,同时普遍提升训练过程的稳定性。在隐藏层达到如此高稀疏度时仍能保持完全或接近完全的准确率,这为降低包含全连接层的机器学习模型的能耗提供了一种极具前景的机制。