Fine-tuning pre-trained vision models for specific tasks is a common practice in computer vision. However, this process becomes more expensive as models grow larger. Recently, parameter-efficient fine-tuning (PEFT) methods have emerged as a popular solution to improve training efficiency and reduce storage needs by tuning additional low-rank modules within pre-trained backbones. Despite their advantages, they struggle with limited representation capabilities and misalignment with pre-trained intermediate features. To address these issues, we introduce an innovative Multi-Kernel Kronecker Adaptation with Re-Scaling Transmission (KARST) for various recognition tasks. Specifically, its multi-kernel design extends Kronecker projections horizontally and separates adaptation matrices into multiple complementary spaces, reducing parameter dependency and creating more compact subspaces. Besides, it incorporates extra learnable re-scaling factors to better align with pre-trained feature distributions, allowing for more flexible and balanced feature aggregation. Extensive experiments validate that our KARST outperforms other PEFT counterparts with a negligible inference cost due to its re-parameterization characteristics. Code is publicly available at: https://github.com/Lucenova/KARST.
翻译:在计算机视觉中,针对特定任务对预训练视觉模型进行微调是一种常见做法。然而,随着模型规模增大,这一过程变得愈加昂贵。最近,参数高效微调方法通过仅调整预训练主干网络内部的额外低秩模块,成为提升训练效率、减少存储需求的热门解决方案。尽管具有优势,这些方法仍受限于表示能力不足以及与预训练中间特征的对齐偏差。为解决这些问题,我们提出了一种创新的多核Kronecker自适应与重缩放传输方法,适用于多种识别任务。具体而言,其多核设计水平扩展了Kronecker投影,并将自适应矩阵分离至多个互补空间,从而减少参数依赖性并创建更紧凑的子空间。此外,该方法引入额外的可学习重缩放因子,以更好地对齐预训练特征分布,实现更灵活均衡的特征聚合。大量实验验证表明,得益于其重参数化特性,我们的KARST方法在推理成本可忽略的前提下,性能优于其他参数高效微调方法。代码公开于:https://github.com/Lucenova/KARST。