Efficient transfer learning algorithms are key to the success of foundation models on diverse downstream tasks even with limited data. Recent works of Basu et al. (2023) and Kaba et al. (2022) propose group averaging (equitune) and optimization-based methods, respectively, over features from group-transformed inputs to obtain equivariant outputs from non-equivariant neural networks. While Kaba et al. (2022) are only concerned with training from scratch, we find that equitune performs poorly on equivariant zero-shot tasks despite good finetuning results. We hypothesize that this is because pretrained models provide better quality features for certain transformations than others and simply averaging them is deleterious. Hence, we propose {\lambda}-equitune that averages the features using importance weights, {\lambda}s. These weights are learned directly from the data using a small neural network, leading to excellent zero-shot and finetuned results that outperform equitune. Further, we prove that {\lambda}-equitune is equivariant and a universal approximator of equivariant functions. Additionally, we show that the method of Kaba et al. (2022) used with appropriate loss functions, which we call equizero, also gives excellent zero-shot and finetuned performance. Both equitune and equizero are special cases of {\lambda}-equitune. To show the simplicity and generality of our method, we validate on a wide range of diverse applications and models such as 1) image classification using CLIP, 2) deep Q-learning, 3) fairness in natural language generation (NLG), 4) compositional generalization in languages, and 5) image classification using pretrained CNNs such as Resnet and Alexnet.
翻译:高效迁移学习算法是基础模型在数据有限的情况下成功应用于多种下游任务的关键。Basu等人(2023)和Kaba等人(2022)的最新研究分别提出了群平均(equitune)方法和基于优化的方法,通过对群变换后的输入特征进行处理,从非等变神经网络中获取等变输出。尽管Kaba等人(2022)仅关注从头训练,但我们发现equitune在等变零样本任务中表现不佳,尽管微调结果良好。我们推测,这是因为预训练模型对某些变换提供的特征质量高于其他变换,而简单取平均会损害性能。因此,我们提出λ-equitune方法,利用重要性权重λ对特征进行加权平均。这些权重通过一个小型神经网络直接从数据中学习,从而在零样本和微调任务中取得优于equitune的结果。此外,我们证明λ-equitune是等变的,并且是等变函数的通用近似器。我们还表明,Kaba等人(2022)的方法结合适当的损失函数(我们称之为equizero)也能在零样本和微调任务中表现出色。equitune和equizero都是λ-equitune的特例。为了展示我们方法的简洁性和通用性,我们在一系列多样化的应用和模型上进行了验证,包括:1)使用CLIP进行图像分类,2)深度Q学习,3)自然语言生成(NLG)中的公平性,4)语言中的组合泛化,以及5)使用预训练卷积神经网络(如ResNet和AlexNet)进行图像分类。