Efficient transfer learning algorithms are key to the success of foundation models on diverse downstream tasks even with limited data. Recent works of \cite{basu2022equi} and \cite{kaba2022equivariance} propose group averaging (\textit{equitune}) and optimization-based methods, respectively, over features from group-transformed inputs to obtain equivariant outputs from non-equivariant neural networks. While \cite{kaba2022equivariance} are only concerned with training from scratch, we find that equitune performs poorly on equivariant zero-shot tasks despite good finetuning results. We hypothesize that this is because pretrained models provide better quality features for certain transformations than others and simply averaging them is deleterious. Hence, we propose $\lambda$-\textit{equitune} that averages the features using \textit{importance weights}, $\lambda$s. These weights are learned directly from the data using a small neural network, leading to excellent zero-shot and finetuned results that outperform equitune. Further, we prove that $\lambda$-equitune is equivariant and a universal approximator of equivariant functions. Additionally, we show that the method of \cite{kaba2022equivariance} used with appropriate loss functions, which we call \textit{equizero}, also gives excellent zero-shot and finetuned performance. Both equitune and equizero are special cases of $\lambda$-equitune. To show the simplicity and generality of our method, we validate on a wide range of diverse applications and models such as 1) image classification using CLIP, 2) deep Q-learning, 3) fairness in natural language generation (NLG), 4) compositional generalization in languages, and 5) image classification using pretrained CNNs such as Resnet and Alexnet.
翻译:高效的迁移学习算法是基础模型在有限数据下成功应用于多样下游任务的关键。近期研究\[basu2022equi\]和\[kaba2022equivariance\]分别提出了基于群平均(等变调整)和基于优化的方法,通过对群变换后的输入特征进行处理,使非等变神经网络输出等变结果。然而\[kaba2022equivariance\]仅关注从头训练,我们发现等变调整在等变零样本任务上表现不佳,尽管微调结果良好。我们假设这是由于预训练模型对不同变换提供的特征质量存在差异,简单平均反而可能有害。为此,我们提出λ-等变调整方法,利用重要性权重λ对特征进行加权平均。这些权重由一个小型神经网络直接从数据中学习,从而在零样本和微调任务中均取得优于等变调整的出色结果。进一步地,我们证明λ-等变调整具有等变性,且是等变函数的通用逼近器。此外,我们表明\[kaba2022equivariance\]方法在采用合适损失函数(我们称之为等变零样本)时,同样在零样本和微调任务中表现优异。等变调整和等变零样本均为λ-等变调整的特例。为展示本方法的简洁性和通用性,我们在多种不同应用和模型上进行了验证,包括:1)基于CLIP的图像分类;2)深度Q学习;3)自然语言生成的公平性;4)语言组合泛化;5)基于预训练CNN(如ResNet和AlexNet)的图像分类。