Large-scale pre-trained vision models (PVMs) have shown great potential for adaptability across various downstream vision tasks. However, with state-of-the-art PVMs growing to billions or even trillions of parameters, the standard full fine-tuning paradigm is becoming unsustainable due to high computational and storage demands. In response, researchers are exploring parameter-efficient fine-tuning (PEFT), which seeks to exceed the performance of full fine-tuning with minimal parameter modifications. This survey provides a comprehensive overview and future directions for visual PEFT, offering a systematic review of the latest advancements. First, we provide a formal definition of PEFT and discuss model pre-training methods. We then categorize existing methods into three categories: addition-based, partial-based, and unified-based. Finally, we introduce the commonly used datasets and applications and suggest potential future research challenges. A comprehensive collection of resources is available at https://github.com/synbol/Awesome-Parameter-Efficient-Transfer-Learning.
翻译:大规模预训练视觉模型(PVMs)在各类下游视觉任务中展现出强大的自适应潜力。然而,随着最先进的PVM参数规模增长至数十亿甚至数万亿级别,标准全微调范式因高昂的计算和存储需求而难以为继。为此,研究人员开始探索参数高效微调(PEFT)方法,旨在通过最小的参数调整实现超越全微调的性能。本综述对视觉PEFT领域进行了全面回顾并展望未来方向,系统梳理了最新研究进展。首先,我们给出PEFT的形式化定义并讨论模型预训练方法。继而将现有方法分为三类:基于增补法、基于局部法和基于统一法。最后介绍常用数据集与应用场景,并提出潜在未来研究挑战。相关资源合集可访问https://github.com/synbol/Awesome-Parameter-Efficient-Transfer-Learning获取。