Global Minimizers of $\ell^p$-Regularized Objectives Yield the Sparsest ReLU Neural Networks

Overparameterized neural networks can interpolate a given dataset in many different ways, prompting the fundamental question: which among these solutions should we prefer, and what explicit regularization strategies will provably yield these solutions? This paper addresses the challenge of finding the sparsest interpolating ReLU network--i.e., the network with the fewest nonzero parameters or neurons--a goal with wide-ranging implications for efficiency, generalization, interpretability, theory, and model compression. Unlike post hoc pruning approaches, we propose a continuous, almost-everywhere differentiable training objective whose global minima are guaranteed to correspond to the sparsest single-hidden-layer ReLU networks that fit the data. This result marks a conceptual advance: it recasts the combinatorial problem of sparse interpolation as a smooth optimization task, potentially enabling the use of gradient-based training methods. Our objective is based on minimizing $\ell^p$ quasinorms of the weights for $0 < p < 1$, a classical sparsity-promoting strategy in finite-dimensional settings. However, applying these ideas to neural networks presents new challenges: the function class is infinite-dimensional, and the weights are learned using a highly nonconvex objective. We prove that, under our formulation, global minimizers correspond exactly to sparsest solutions. Our work lays a foundation for understanding when and how continuous sparsity-inducing objectives can be leveraged to recover sparse networks through training.

翻译：过参数化的神经网络能够以多种不同方式插值给定数据集，这引发了一个根本性问题：我们应该优先选择这些解中的哪一个？哪些显式正则化策略能够被证明产生这些解？本文致力于寻找最稀疏的插值ReLU网络——即具有最少非零参数或神经元的网络——这一目标对效率、泛化性、可解释性、理论研究和模型压缩具有广泛意义。与事后剪枝方法不同，我们提出了一种连续且几乎处处可微的训练目标，其全局极小值被保证对应于拟合数据的最稀疏单隐藏层ReLU网络。这一结果标志着一个概念性进展：它将稀疏插值的组合问题重新表述为一个平滑优化任务，从而可能实现基于梯度的训练方法。我们的目标基于最小化权重在$0 < p < 1$时的$\ell^p$拟范数，这是有限维设置中经典的稀疏性促进策略。然而，将这些思想应用于神经网络带来了新的挑战：函数类是无限维的，且权重是通过高度非凸的目标学习的。我们证明，在我们的公式下，全局极小值精确对应于最稀疏的解。我们的工作为理解何时以及如何利用连续稀疏性诱导目标通过训练恢复稀疏网络奠定了基础。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Query2box: 使用盒嵌入对向量空间中的知识图谱进行推理，Query2box: Reasoning over Knowledge Graphs in Vector Space Using Box Embeddings

专知会员服务

46+阅读 · 2020年5月11日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日