Regularization is a set of techniques that are used to improve the generalization ability of deep neural networks. In this paper, we introduce weight compander (WC), a novel effective method to improve generalization by reparameterizing each weight in deep neural networks using a nonlinear function. It is a general, intuitive, cheap and easy to implement method, which can be combined with various other regularization techniques. Large weights in deep neural networks are a sign of a more complex network that is overfitted to the training data. Moreover, regularized networks tend to have a greater range of weights around zero with fewer weights centered at zero. We introduce a weight reparameterization function which is applied to each weight and implicitly reduces overfitting by restricting the magnitude of the weights while forcing them away from zero at the same time. This leads to a more democratic decision-making in the network. Firstly, individual weights cannot have too much influence in the prediction process due to the restriction of their magnitude. Secondly, more weights are used in the prediction process, since they are forced away from zero during the training. This promotes the extraction of more features from the input data and increases the level of weight redundancy, which makes the network less sensitive to statistical differences between training and test data. We extend our method to learn the hyperparameters of the introduced weight reparameterization function. This avoids hyperparameter search and gives the network the opportunity to align the weight reparameterization with the training progress. We show experimentally that using weight compander in addition to standard regularization methods improves the performance of neural networks.
翻译:摘要:正则化是一系列用于提升深度神经网络泛化能力的技术集合。本文提出权重压缩器(WC),这是一种通过非线性函数对深度神经网络中每个权重进行重参数化来改进泛化性能的新型有效方法。该方法具有通用性、直观性、低计算成本且易于实现的特点,可与其他多种正则化技术联合使用。深度神经网络中的大权重通常意味着网络复杂度较高且对训练数据存在过拟合现象。此外,经过正则化的网络倾向于在零值附近具有更广的权重分布范围,同时以零为中心的权重数量更少。我们引入一种应用于每个权重的重参数化函数,通过限制权重幅值并同时迫使权重偏离零值,隐式地减少过拟合。这促使网络实现更民主的决策机制:首先,由于幅值限制,单个权重在预测过程中的影响力无法过大;其次,训练过程中权重被强制偏离零值,使得更多权重参与到预测过程中。这种设计促进了从输入数据中提取更多特征,并提高了权重冗余度,从而使网络对训练数据与测试数据之间的统计差异更不敏感。我们进一步扩展该方法,使其能够学习所引入的权重重参数化函数的超参数。这一设计避免了超参数搜索,并赋予网络将权重重参数化与训练进程对齐的能力。实验表明,在标准正则化方法基础上引入权重压缩器可有效提升神经网络性能。