We present PSiLON Net, an MLP architecture that uses $L_1$ weight normalization for each weight vector and shares the length parameter across the layer. The 1-path-norm provides a bound for the Lipschitz constant of a neural network and reflects on its generalizability, and we show how PSiLON Net's design drastically simplifies the 1-path-norm, while providing an inductive bias towards efficient learning and near-sparse parameters. We propose a pruning method to achieve exact sparsity in the final stages of training, if desired. To exploit the inductive bias of residual networks, we present a simplified residual block, leveraging concatenated ReLU activations. For networks constructed with such blocks, we prove that considering only a subset of possible paths in the 1-path-norm is sufficient to bound the Lipschitz constant. Using the 1-path-norm and this improved bound as regularizers, we conduct experiments in the small data regime using overparameterized PSiLON Nets and PSiLON ResNets, demonstrating reliable optimization and strong performance.
翻译:本文提出 PSiLON 网络,这是一种采用多层感知器架构的方法,对每个权重向量进行 $L_1$ 权重归一化,并在层内共享长度参数。1-路径-范数为神经网络的 Lipschitz 常数提供了上界,并反映了其泛化能力。我们展示了 PSiLON 网络的设计如何大幅简化 1-路径-范数,同时提供有利于高效学习和近似稀疏参数的归纳偏置。若需要,我们提出一种剪枝方法,可在训练后期实现精确稀疏性。为利用残差网络的归纳偏置,我们提出一种简化残差块,采用串联 ReLU 激活函数。对于由此类残差块构建的网络,我们证明仅考虑 1-路径-范数中部分可能的路径即可界定 Lipschitz 常数。利用 1-路径-范数及其改进上界作为正则化项,我们在小数据规模下使用过参数化的 PSiLON 网络和 PSiLON 残差网络进行实验,展示了可靠的优化过程和优越的性能。