Foundation models have achieved great advances in multi-task learning with a unified interface of unimodal and multimodal tasks. However, the potential of such multi-task learners has not been exploited during transfer learning. In this work, we present a universal parameter-efficient transfer learning method, termed Predict-Interpolate Tuning ($\pi$-Tuning), for vision, language, and vision-language tasks. It aggregates the parameters of lightweight task-specific experts learned from similar tasks to aid the target downstream task. The task similarities are predicted in a unified modality-independent space, yielding a scalable graph to demonstrate task relationships. $\pi$-Tuning has several appealing benefits. First, it flexibly explores both intra- and inter-modal transferability between similar tasks to improve the accuracy and robustness of transfer learning, especially in data-scarce scenarios. Second, it offers a systematical solution for transfer learning with multi-task prediction-and-then-interpolation, compatible with diverse types of parameter-efficient experts, such as prompt and adapter. Third, an extensive study of task-level mutual benefits on 14 unimodal and 6 multimodal datasets shows that $\pi$-Tuning surpasses fine-tuning and other parameter-efficient transfer learning methods both in full-shot and low-shot regimes. The task graph also enables an in-depth interpretable analysis of task transferability across modalities. The code will be available at https://github.com/TencentARC/pi-Tuning.
翻译:基础模型以统一接口处理单模态和多模态任务,在多任务学习领域取得了重大进展。然而,这类多任务学习器在迁移学习中的潜力尚未得到充分挖掘。本文提出一种通用的参数高效迁移学习方法——预测-插值调优($π$-Tuning),适用于视觉、语言及视觉-语言任务。该方法聚合从相似任务中学习到的轻量级任务特定专家参数,以辅助目标任务。任务相似度在统一的模态无关空间中预测,生成可扩展的图结构以揭示任务关系。$π$-Tuning具有多项优势。第一,它能灵活探索相似任务间的模态内与模态间可迁移性,提升迁移学习的准确性与鲁棒性,尤其在数据稀缺场景中表现显著。第二,它为多任务预测-插值式迁移学习提供系统性解决方案,兼容提示(prompt)与适配器(adapter)等多种参数高效专家模块。第三,在14个单模态和6个多模态数据集上的广泛任务级互惠研究表明,$π$-Tuning在全样本与低样本场景下均超越微调及其他参数高效迁移学习方法。此外,任务图支持跨模态任务可迁移性的深度可解释分析。代码将发布于https://github.com/TencentARC/pi-Tuning。