Feature normalization transforms such as Batch and Layer-Normalization have become indispensable ingredients of state-of-the-art deep neural networks. Recent studies on fine-tuning large pretrained models indicate that just tuning the parameters of these affine transforms can achieve high accuracy for downstream tasks. These findings open the questions about the expressive power of tuning the normalization layers of frozen networks. In this work, we take the first step towards this question and show that for random ReLU networks, fine-tuning only its normalization layers can reconstruct any target network that is $O(\sqrt{\text{width}})$ times smaller. We show that this holds even for randomly sparsified networks, under sufficient overparameterization, in agreement with prior empirical work.
翻译:特征归一化变换(如批归一化和层归一化)已成为最先进深度神经网络不可或缺的组成部分。近期关于微调大规模预训练模型的研究表明,仅调整这些仿射变换的参数即可在下游任务中实现高精度。这些发现引发了一个问题:冻结网络的归一化层经调优后,其表达能力究竟如何?在本文中,我们首次针对该问题展开研究,并证明:对于随机ReLU网络,仅微调其归一化层即可重构任意规模为目标网络的小根号倍($O(\sqrt{\text{width}})$)的目标网络。我们进一步表明,在充分过参数化条件下,即使针对随机稀疏化网络,该结论依然成立,这与先前的实证研究结果一致。