This work develops a framework for post-training augmentation invariance, in which our goal is to add invariance properties to a pretrained network without altering its behavior on the original, non-augmented input distribution. We define this notion precisely and additionally introduce augmented encoders, which are probabilistic encoders that formalize augmentation-based encoding processes and that serve as our fundamental object of study. We introduce two losses for augmented encoders, namely, Markov-Wasserstein minimization and Wasserstein correlation maximization, and we demonstrate empirically that both losses can be used to train lightweight, one-hidden-layer MLP adapter networks $E_θ$ that, when appended to the latent space of a pretrained network $F$, do indeed lead to (approximate) post-training augmentation invariance. For example, on STL10 with $F=\text{DINO}$ features, the composite network $C\circ E_θ\circ F$, where $C$ is a linear classifier and where $E_θ$ is one of our proposed adapter networks, achieves 94% classification accuracy on arbitrarily rotated images, whereas a network of the form $C\circ F$ without the adapter $E_θ$ drops to 71% accuracy. Similarly, we can boost noise-invariant classification results from 58% up to 86%. Significantly, we obtain these results with no fine-tuning (the weights of $F$ remain frozen throughout), and our methods introduce little corruption to the original features, since $E_θ$ acts nearly isometrically on the non-augmented latent distribution. In contrast, we show that adapter networks trained with alternative candidate losses, specifically SimCLR and HSIC maximization, produce uncompetitive classification results and fundamentally corrupt the original latent space. Code available at https://github.com/keenan-eikenberry/augmentation_invariance
翻译:本文提出了一种后训练增强不变性框架,旨在为预训练网络添加不变性属性,同时不改变其在原始非增强输入分布上的行为。我们精确定义了该概念,并引入了增强编码器——这是一种形式化基于增强编码过程的概率编码器,也是我们研究的基本对象。我们针对增强编码器提出了两种损失函数,即马尔可夫-瓦瑟斯坦最小化和瓦瑟斯坦相关性最大化,并通过实验证明这两种损失可用于训练轻量级单隐藏层MLP适配器网络 $E_θ$,当将其附加到预训练网络 $F$ 的隐空间后,确实能够实现(近似)后训练增强不变性。例如,在STL10数据集上使用 $F=\text{DINO}$ 特征时,复合网络 $C\circ E_θ\circ F$(其中 $C$ 为线性分类器,$E_θ$ 为本文提出的适配器网络之一)在任意旋转图像上达到94%的分类准确率,而未经适配器 $E_θ$ 的 $C\circ F$ 网络则降至71%准确率。类似地,我们可将噪声不变分类结果从58%提升至86%。值得注意的是,这些结果无需微调($F$ 的权重全程冻结),且我们的方法对原始特征的破坏极小,因为 $E_θ$ 几乎以等距方式作用于非增强潜分布。相比之下,使用替代候选损失(特别是SimCLR和HSIC最大化)训练的适配器网络会产生不具备竞争力的分类结果,并从根本上破坏原始隐空间。代码见 https://github.com/keenan-eikenberry/augmentation_invariance