Pre-trained models have become indispensable for efficiently building models across a broad spectrum of downstream tasks. The advantages of pre-trained models have been highlighted by empirical studies on scaling laws, which demonstrate that larger pre-trained models can significantly reduce the sample complexity of downstream learning. However, existing theoretical investigations of pre-trained models lack the capability to explain this phenomenon. In this paper, we provide a theoretical investigation by introducing a novel framework, caulking, inspired by parameter-efficient fine-tuning (PEFT) methods such as adapter-based fine-tuning, low-rank adaptation, and partial fine-tuning. Our analysis establishes that improved pre-trained models provably decrease the sample complexity of downstream tasks, thereby offering theoretical justification for the empirically observed scaling laws relating pre-trained model size to downstream performance, a relationship not covered by existing results.
翻译:预训练模型已成为高效构建广泛下游任务模型的不可或缺工具。关于缩放定律的实证研究凸显了预训练模型的优势,这些研究表明,更大的预训练模型能显著降低下游学习的样本复杂度。然而,现有关于预训练模型的理论研究缺乏解释这一现象的能力。本文通过引入一个受参数高效微调方法(如基于适配器的微调、低秩适应和部分微调)启发的新框架——填隙(caulking),进行了理论探究。我们的分析证明,改进的预训练模型可证明地降低下游任务的样本复杂度,从而为实证观察到的预训练模型规模与下游性能之间的缩放定律提供了理论依据,这一关系是现有结果所未涵盖的。