This work develops a framework for post-training augmentation invariance, in which our goal is to add invariance properties to a pretrained network without altering its behavior on the original, non-augmented input distribution. We define this notion precisely and additionally introduce augmented encoders, which are probabilistic encoders that formalize augmentation-based encoding processes and that serve as our fundamental object of study. We introduce two losses for augmented encoders, namely, Markov-Wasserstein minimization and Wasserstein correlation maximization, and we demonstrate empirically that both losses can be used to train lightweight, one-hidden-layer MLP adapter networks E_theta that, when appended to the latent space of a pretrained network F, do indeed lead to (approximate) post-training augmentation invariance. For example, on STL10 with F = DINOv2 features, the composite network C o E_theta o F, where C is a linear classifier and where E_theta is one of our proposed adapter networks, achieves 94% classification accuracy on arbitrarily rotated images, whereas a network of the form C o F without the adapter E_theta drops to 71% accuracy. Similarly, we can boost noise-invariant classification results from 58% up to 86%. Significantly, we obtain these results with no fine-tuning (the weights of F remain frozen throughout), and our methods introduce little corruption to the original features, since E_theta acts nearly isometrically on the non-augmented latent distribution. In contrast, we show that adapter networks trained with alternative candidate losses, specifically SimCLR and HSIC maximization, produce uncompetitive classification results and fundamentally corrupt the original latent space. Code available at: https://github.com/keenan-eikenberry/augmentation_invariance
翻译:[翻译后的论文摘要]
本文提出了一种训练后增强不变性框架,其目标是在不改变预训练网络对原始(非增强)输入分布行为的前提下,为该网络添加不变性属性。我们精确地定义了该概念,并引入增强编码器(augmented encoders)——一种基于增强过程的概率编码器,作为核心研究对象。针对增强编码器,我们提出了两种损失函数:马尔可夫-瓦瑟斯坦最小化(Markov-Wasserstein minimization)与瓦瑟斯坦相关性最大化(Wasserstein correlation maximization)。实证表明,这两种损失函数可用于训练轻量级单隐层MLP适配网络 E_theta,将其附加到预训练网络 F 的潜在空间后,能实现(近似)训练后增强不变性。例如,在STL10数据集上,使用 F = DINOv2 特征时,复合网络 C ∘ E_theta ∘ F(其中C为线性分类器,E_theta为本文提出的适配网络)对任意旋转图像的分类准确率达到94%,而未加适配网络E_theta的C ∘ F结构准确率下降至71%。类似地,噪声不变分类结果可从58%提升至86%。值得注意的是,本方法无需微调(F的权重全程冻结),且对原始特征的破坏极小——E_theta在非增强潜在分布上近似等距作用。相比之下,采用替代候选损失函数(如SimCLR和HSIC最大化)训练的适配网络,不仅分类结果不具竞争力,更会从根本上破坏原始潜在空间。代码地址:https://github.com/keenan-eikenberry/augmentation_invariance