Memorization Capacity for Additive Fine-Tuning with Small ReLU Networks

Fine-tuning large pre-trained models is a common practice in machine learning applications, yet its mathematical analysis remains largely unexplored. In this paper, we study fine-tuning through the lens of memorization capacity. Our new measure, the Fine-Tuning Capacity (FTC), is defined as the maximum number of samples a neural network can fine-tune, or equivalently, as the minimum number of neurons ($m$) needed to arbitrarily change $N$ labels among $K$ samples considered in the fine-tuning process. In essence, FTC extends the memorization capacity concept to the fine-tuning scenario. We analyze FTC for the additive fine-tuning scenario where the fine-tuned network is defined as the summation of the frozen pre-trained network $f$ and a neural network $g$ (with $m$ neurons) designed for fine-tuning. When $g$ is a ReLU network with either 2 or 3 layers, we obtain tight upper and lower bounds on FTC; we show that $N$ samples can be fine-tuned with $m=\Theta(N)$ neurons for 2-layer networks, and with $m=\Theta(\sqrt{N})$ neurons for 3-layer networks, no matter how large $K$ is. Our results recover the known memorization capacity results when $N = K$ as a special case.

翻译：微调大型预训练模型是机器学习应用中的常见实践，但其数学分析在很大程度上仍未得到探索。本文通过存储容量的视角研究微调过程。我们提出的新度量——微调容量（FTC），定义为神经网络能够微调的最大样本数，等价于在微调过程中为任意改变$K$个样本中的$N$个标签所需的最小神经元数量（$m$）。本质上，FTC将存储容量的概念扩展到了微调场景。我们分析了加法微调场景下的FTC，其中微调后的网络定义为冻结的预训练网络$f$与专为微调设计的神经网络$g$（具有$m$个神经元）之和。当$g$为具有2层或3层的ReLU网络时，我们获得了FTC的紧致上下界；研究表明，对于2层网络，$N$个样本可通过$m=\Theta(N)$个神经元进行微调；对于3层网络，则需$m=\Theta(\sqrt{N})$个神经元，且该结果与$K$的大小无关。当$N = K$时，我们的结果作为特例恢复了已知的存储容量结论。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日