$L_{p}$-norm regularization schemes such as $L_{0}$, $L_{1}$, and $L_{2}$-norm regularization and $L_{p}$-norm-based regularization techniques such as weight decay and group LASSO compute a quantity which depends on model weights considered in isolation from one another. This paper describes a novel regularizer which is not based on an $L_{p}$-norm. In contrast with $L_{p}$-norm-based regularization, this regularizer is concerned with the spatial arrangement of weights within a weight matrix. This regularizer is an additive term for the loss function and is differentiable, simple and fast to compute, scale-invariant, requires a trivial amount of additional memory, and can easily be parallelized. Empirically this method yields approximately a one order-of-magnitude improvement in the number of nonzero model parameters at a given level of accuracy.
翻译:诸如$L_{0}$、$L_{1}$和$L_{2}$范数正则化等$L_{p}$范数正则化方案,以及基于$L_{p}$范数的正则化技术(如权重衰减和组LASSO),所计算的是与模型权重孤立考虑相关的量。本文描述了一种不基于$L_{p}$范数的新型正则化器。与基于$L_{p}$范数的正则化不同,该正则化器关注权重矩阵内权重的空间排列。该正则化器是损失函数的加性项,具有可微性、计算简单快速、尺度不变性,所需额外内存量极少,且易于并行化。实验结果表明,在给定精度水平下,该方法能将非零模型参数数量提升约一个数量级。