Variation Spaces for Multi-Output Neural Networks: Insights on Multi-Task Learning and Network Compression

This paper introduces a novel theoretical framework for the analysis of vector-valued neural networks through the development of vector-valued variation spaces, a new class of reproducing kernel Banach spaces. These spaces emerge from studying the regularization effect of weight decay in training networks with activations like the rectified linear unit (ReLU). This framework offers a deeper understanding of multi-output networks and their function-space characteristics. A key contribution of this work is the development of a representer theorem for the vector-valued variation spaces. This representer theorem establishes that shallow vector-valued neural networks are the solutions to data-fitting problems over these infinite-dimensional spaces, where the network widths are bounded by the square of the number of training data. This observation reveals that the norm associated with these vector-valued variation spaces encourages the learning of features that are useful for multiple tasks, shedding new light on multi-task learning with neural networks. Finally, this paper develops a connection between weight-decay regularization and the multi-task lasso problem. This connection leads to novel bounds for layer widths in deep networks that depend on the intrinsic dimensions of the training data representations. This insight not only deepens the understanding of the deep network architectural requirements, but also yields a simple convex optimization method for deep neural network compression. The performance of this compression procedure is evaluated on various architectures.

翻译：本文通过构建向量值变分空间——一类新的再生核巴拿赫空间，提出了一个用于分析向量值神经网络的新理论框架。这些空间源于研究具有整流线性单元（ReLU）等激活函数的网络在训练中权重衰减的正则化效应。该框架为理解多输出网络及其函数空间特性提供了更深入的视角。本工作的一个核心贡献是建立了向量值变分空间的表示定理。该表示定理证明，浅层向量值神经网络是这些无限维空间上数据拟合问题的解，且网络宽度受训练数据数量的平方所限制。这一发现揭示了与这些向量值变分空间相关联的范数能够促进对多个任务有用特征的学习，从而为神经网络的多任务学习提供了新的理解。最后，本文建立了权重衰减正则化与多任务套索问题之间的联系。这一联系推导出了深度网络中各层宽度的新颖上界，这些上界依赖于训练数据表示的内在维度。这一洞见不仅深化了对深度网络架构需求的理解，还产生了一种用于深度神经网络压缩的简单凸优化方法。该压缩方法的性能在多种架构上进行了评估。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日