With the surge of large-scale pre-trained models (PTMs), fine-tuning these models to numerous downstream tasks becomes a crucial problem. Consequently, parameter efficient transfer learning (PETL) of large models has grasped huge attention. While recent PETL methods showcase impressive performance, they rely on optimistic assumptions: 1) the entire parameter set of a PTM is available, and 2) a sufficiently large memory capacity for the fine-tuning is equipped. However, in most real-world applications, PTMs are served as a black-box API or proprietary software without explicit parameter accessibility. Besides, it is hard to meet a large memory requirement for modern PTMs. In this work, we propose black-box visual prompting (BlackVIP), which efficiently adapts the PTMs without knowledge about model architectures and parameters. BlackVIP has two components; 1) Coordinator and 2) simultaneous perturbation stochastic approximation with gradient correction (SPSA-GC). The Coordinator designs input-dependent image-shaped visual prompts, which improves few-shot adaptation and robustness on distribution/location shift. SPSA-GC efficiently estimates the gradient of a target model to update Coordinator. Extensive experiments on 16 datasets demonstrate that BlackVIP enables robust adaptation to diverse domains without accessing PTMs' parameters, with minimal memory requirements. Code: \url{https://github.com/changdaeoh/BlackVIP}
翻译:随着大规模预训练模型(PTMs)的兴起,如何将这些模型微调至众多下游任务成为一个关键问题。因此,大模型的参数高效迁移学习(PETL)受到了广泛关注。尽管近年来的PETL方法展现出令人印象深刻的表现,但它们依赖于两个乐观假设:1)PTM的全部参数集可访问,以及2)具备足够大的内存容量以支持微调。然而,在大多数实际应用中,PTM以黑盒API或专有软件的形式提供服务,无法显式获取其参数。此外,现代PTM对内存的高要求也难以满足。本文提出黑盒视觉提示方法(BlackVIP),无需了解模型架构和参数即可高效适配PTM。BlackVIP包含两个组件:1)协调器(Coordinator)和2)带梯度修正的同步扰动随机近似(SPSA-GC)。协调器设计输入依赖的图像形视觉提示,从而提升少样本适应能力以及对分布/位置偏移的鲁棒性。SPSA-GC高效估计目标模型的梯度以更新协调器。在16个数据集上的广泛实验表明,BlackVIP无需访问PTM参数,仅需极低内存需求即可实现对不同领域的鲁棒适配。代码地址:\url{https://github.com/changdaeoh/BlackVIP}