Models pre-trained on large-scale datasets are often fine-tuned to support newer tasks and datasets that arrive over time. This process necessitates storing copies of the model over time for each task that the pre-trained model is fine-tuned to. Building on top of recent model patching work, we propose $\Delta$-Patching for fine-tuning neural network models in an efficient manner, without the need to store model copies. We propose a simple and lightweight method called $\Delta$-Networks to achieve this objective. Our comprehensive experiments across setting and architecture variants show that $\Delta$-Networks outperform earlier model patching work while only requiring a fraction of parameters to be trained. We also show that this approach can be used for other problem settings such as transfer learning and zero-shot domain adaptation, as well as other tasks such as detection and segmentation.
翻译:大规模数据集上预训练的模型常需微调以支持随时间出现的新任务和新数据集。该过程要求为每个微调任务存储模型副本。基于近期模型修补工作,我们提出$Δ$-Patching方法,以高效方式微调神经网络模型,无需存储模型副本。我们提出一种简单轻量的方法$Δ$-Networks以实现此目标。跨设置与架构变体的综合实验表明,$Δ$-Networks在仅需训练少量参数的情况下,性能优于先前模型修补工作。我们还证明该方法可应用于迁移学习、零样本域适应等其他问题场景,以及检测、分割等任务。