Deep convolutional networks are ubiquitous in computer vision, due to their excellent performance across different tasks for various domains. Models are, however, often trained in isolation for each task, failing to exploit relatedness between tasks and domains to learn more compact models that generalise better in low-data regimes. Multi-domain learning aims to handle related tasks, such as image classification across multiple domains, simultaneously. Previous work on this problem explored the use of a pre-trained and fixed domain-agnostic base network, in combination with smaller learnable domain-specific adaptation modules. In this paper, we introduce Modulation Adapters, which update the convolutional filter weights of the model in a multiplicative manner for each task. Parameterising these adaptation weights in a factored manner allows us to scale the number of per-task parameters in a flexible manner, and to strike different parameter-accuracy trade-offs. We evaluate our approach on the Visual Decathlon challenge, composed of ten image classification tasks across different domains, and on the ImageNet-to-Sketch benchmark, which consists of six image classification tasks. Our approach yields excellent results, with accuracies that are comparable to or better than those of existing state-of-the-art approaches.
翻译:深度卷积网络在计算机视觉领域无处不在,因其在不同领域的各项任务中均表现出色。然而,模型通常针对每个任务单独进行训练,未能利用任务与领域之间的关联性来学习更紧凑的模型,从而在低数据条件下实现更好的泛化。多域学习旨在同时处理相关任务,例如跨多个领域的图像分类。针对该问题的先前研究探索了使用预训练且固定的领域无关基础网络,并结合较小的可学习领域特定适配模块。在本文中,我们引入了调制适配器,它以乘法方式针对每个任务更新模型的卷积滤波器权重。通过以分解方式参数化这些适配权重,我们可以灵活地扩展每个任务的参数数量,并在不同的参数与精度权衡之间取得平衡。我们在视觉十项全能挑战(包含十个不同领域的图像分类任务)以及ImageNet到素描基准(包含六个图像分类任务)上评估了我们的方法。我们的方法取得了出色的结果,其准确率与现有最先进方法相当或更优。