We analyze the inductive bias of gradient descent for weight normalized smooth homogeneous neural nets, when trained on exponential or cross-entropy loss. We analyse both standard weight normalization (SWN) and exponential weight normalization (EWN), and show that the gradient flow path with EWN is equivalent to gradient flow on standard networks with an adaptive learning rate. We extend these results to gradient descent, and establish asymptotic relations between weights and gradients for both SWN and EWN. We also show that EWN causes weights to be updated in a way that prefers asymptotic relative sparsity. For EWN, we provide a finite-time convergence rate of the loss with gradient flow and a tight asymptotic convergence rate with gradient descent. We demonstrate our results for SWN and EWN on synthetic data sets. Experimental results on simple datasets support our claim on sparse EWN solutions, even with SGD. This demonstrates its potential applications in learning neural networks amenable to pruning.
翻译:我们分析了在指数损失或交叉熵损失训练下,梯度下降在权重归一化光滑齐次神经网络中的归纳偏置。我们同时研究了标准权重归一化(SWN)和指数型权重归一化(EWN),并证明EWN下的梯度流路径等价于具有自适应学习率的标准网络上的梯度流。我们将这些结果推广至梯度下降,并建立了SWN与EWN中权重与梯度之间的渐近关系。研究还表明,EWN会导致权重以偏好渐近相对稀疏性的方式更新。针对EWN,我们给出了梯度流下损失的有限时间收敛率以及梯度下降下紧致的渐近收敛率。我们在合成数据集上展示了SWN与EWN的相应结果。简单数据集上的实验结果支持了关于EWN稀疏解的论断,即使在使用随机梯度下降(SGD)时也是如此。这证明了该方法在学习易于剪枝的神经网络中的潜在应用价值。