Knowledge Composition using Task Vectors with Learned Anisotropic Scaling

Pre-trained models produce strong generic representations that can be adapted via fine-tuning. The learned weight difference relative to the pre-trained model, known as a task vector, characterises the direction and stride of fine-tuning. The significance of task vectors is such that simple arithmetic operations on them can be used to combine diverse representations from different domains. This paper builds on these properties of task vectors and aims to answer (1) whether components of task vectors, particularly parameter blocks, exhibit similar characteristics, and (2) how such blocks can be used to enhance knowledge composition and transfer. To this end, we introduce aTLAS, an algorithm that linearly combines parameter blocks with different learned coefficients, resulting in anisotropic scaling at the task vector level. We show that such linear combinations explicitly exploit the low intrinsic dimensionality of pre-trained models, with only a few coefficients being the learnable parameters. Furthermore, composition of parameter blocks leverages the already learned representations, thereby reducing the dependency on large amounts of data. We demonstrate the effectiveness of our method in task arithmetic, few-shot recognition and test-time adaptation, with supervised or unsupervised objectives. In particular, we show that (1) learned anisotropic scaling allows task vectors to be more disentangled, causing less interference in composition; (2) task vector composition excels with scarce or no labeled data and is less prone to domain shift, thus leading to better generalisability; (3) mixing the most informative parameter blocks across different task vectors prior to training can reduce the memory footprint and improve the flexibility of knowledge transfer. Moreover, we show the potential of aTLAS as a PEFT method, particularly with less data, and demonstrate that its scalibility.

翻译：预训练模型能够产生强大的通用表示，这些表示可以通过微调进行适配。相对于预训练模型所学习到的权重差异，即任务向量，表征了微调的方向与步长。任务向量的重要性在于，对其执行简单的算术运算即可用于组合来自不同领域的多样化表示。本文基于任务向量的这些特性，旨在探究以下问题：(1) 任务向量的组成部分，特别是参数块，是否表现出相似的特征；(2) 如何利用这些参数块来增强知识组合与迁移。为此，我们提出了aTLAS算法，该算法以不同的学习系数线性组合参数块，从而在任务向量层面实现各向异性缩放。我们证明，此类线性组合显式地利用了预训练模型的低内在维度，其中仅有少数系数是可学习参数。此外，参数块的组合利用了已学习到的表示，从而降低了对大量数据的依赖。我们在任务算术、少样本识别和测试时适应等场景中，通过有监督或无监督目标，验证了我们方法的有效性。具体而言，我们证明了：(1) 学习到的各向异性缩放使任务向量更具解耦性，在组合时产生的干扰更少；(2) 任务向量组合在标注数据稀缺或缺失时表现优异，且更不易受领域偏移影响，从而获得更好的泛化能力；(3) 在训练前混合不同任务向量中最具信息量的参数块，可以减少内存占用并提高知识迁移的灵活性。此外，我们展示了aTLAS作为一种参数高效微调方法的潜力，尤其是在数据较少的情况下，并论证了其可扩展性。