VirDA：通过视觉重编程实现无监督域自适应的骨干网络复用 (VirDA: Reusing Backbone for Unsupervised Domain Adaptation with Visual Reprogramming)

Existing UDA pipelines fine-tune already well-trained backbone parameters for every new source-and-target pair, resulting in the number of training parameters and storage memory growing linearly with each new pair, and also preventing the reuse of these well-trained backbone parameters. Inspired by recent implications that existing backbones have textural biases, we propose making use of domain-specific textural bias for domain adaptation via visual reprogramming, namely VirDA. Instead of fine-tuning the full backbone, VirDA prepends a domain-specific visual reprogramming layer to the backbone. This layer produces visual prompts that act as an added textural bias to the input image, adapting its "style" to a target domain. To optimize these visual reprogramming layers, we use multiple objective functions that optimize the intra- and inter-domain distribution differences when domain-adapting visual prompts are applied. This process does not require modifying the backbone parameters, allowing the same backbone to be reused across different domains. We evaluate VirDA on Office-31 and obtain 92.8% mean accuracy with only 1.5M trainable parameters. VirDA surpasses PDA, the state-of-the-art parameter-efficient UDA baseline, by +1.6% accuracy while using just 46% of its parameters. Compared with full-backbone fine-tuning, VirDA outperforms CDTrans and FixBi by +0.2% and +1.4%, respectively, while requiring only 1.7% and 2.8% of their trainable parameters. Relative to the strongest current methods (PMTrans and TVT), VirDA uses ~1.7% of their parameters and trades off only 2.2% and 1.1% accuracy, respectively.

翻译：现有的无监督域自适应（UDA）流程通常针对每一对新的源域-目标域对已经训练良好的骨干网络参数进行微调，这导致可训练参数量和存储内存随新域对数量线性增长，同时也阻碍了这些训练良好的骨干网络参数的复用。受现有骨干网络具有纹理偏置这一近期发现的启发，我们提出通过视觉重编程利用领域特定的纹理偏置进行域自适应，即VirDA。VirDA并非微调整个骨干网络，而是在骨干网络前添加一个领域特定的视觉重编程层。该层生成视觉提示，作为输入图像的附加纹理偏置，从而将其“风格”适应到目标域。为了优化这些视觉重编程层，我们采用多个目标函数，在应用域自适应视觉提示时优化域内和域间分布差异。此过程无需修改骨干网络参数，使得同一骨干网络能够在不同域间重复使用。我们在Office-31数据集上评估VirDA，仅使用150万个可训练参数即获得92.8%的平均准确率。VirDA以仅46%的参数用量，超越了当前最高效的参数高效型UDA基线方法PDA，准确率提升+1.6%。与全骨干网络微调相比，VirDA分别以仅1.7%和2.8%的可训练参数量，在准确率上超过CDTrans和FixBi +0.2%和+1.4%。相较于当前最强方法（PMTrans和TVT），VirDA使用约1.7%的参数，仅以准确率下降2.2%和1.1%为代价。