Domain Generalization Using Large Pretrained Models with Mixture-of-Adapters

Learning a robust vision model despite large distribution shift is essential for model deployment in real-world settings. Especially, domain generalization (DG) algorithm aims to maintain the performance of a trained model on different distributions which were not seen during training. One of the most effective methods has been leveraging the already learned rich knowledge of large pretrained models. However, naively fine-tuning large models to DG tasks is often practically infeasible due to memory limitations, extensive time requirements for training, and the risk of learned knowledge deterioration. Recently, parameter-efficient fine-tuning (PEFT) methods have been proposed to reduce the high computational cost during training and efficiently adapt large models to downstream tasks. In this work, for the first time, we find that the use of adapters in PEFT methods not only reduce high computational cost during training but also serve as an effective regularizer for DG tasks. Surprisingly, a naive adapter implementation for large models achieve superior performance on common datasets. However, in situations of large distribution shifts, additional factors such as optimal amount of regularization due to the strength of distribution shifts should be considered for a sophisticated adapter implementation. To address this, we propose a mixture-of-expert based adapter fine-tuning method, dubbed as mixture-of-adapters (MoA). Specifically, we employ multiple adapters that have varying capacities, and by using learnable routers, we allocate each token to a proper adapter. By using both PEFT and MoA methods, we effectively alleviate the performance deterioration caused by distribution shifts and achieve state-of-the-art performance on diverse DG benchmarks.

翻译：学习一个在较大分布偏移下仍稳健的视觉模型，对于模型在实际场景中的部署至关重要。特别是，领域泛化算法旨在保持训练模型在训练期间未见的不同分布上的性能。最有效的方法之一是利用大规模预训练模型已有的丰富知识。然而，简单地对大规模模型进行微调以适应领域泛化任务，往往因内存限制、训练时间需求过长以及学习知识退化的风险而在实践中不可行。最近，参数高效微调方法被提出，用于降低训练过程中的高计算成本，并有效地将大型模型适应到下游任务。在这项工作中，我们首次发现，在参数高效微调方法中使用适配器不仅能降低训练期间的高计算成本，还能作为领域泛化任务的有效正则化器。令人惊讶的是，对大型模型进行简单的适配器实现就能在常见数据集上达到卓越性能。然而，在较大分布偏移的情况下，还应考虑额外因素，如因分布偏移强度而需的最优正则化量，以实现精细的适配器实现。为解决此问题，我们提出了一种基于专家混合的适配器微调方法，称为混合适配器（Mixture-of-Adapters, MoA）。具体而言，我们采用了多个具有不同容量的适配器，并通过使用可学习的路由器，将每个令牌分配到合适的适配器。通过同时使用参数高效微调和混合适配器方法，我们有效缓解了分布偏移导致的性能退化，并在多种领域泛化基准上达到了最先进的性能。