Prior work has analyzed the robustness of visual encoders to image transformations and corruptions, particularly in cases where such alterations are not seen during training. When this occurs, they introduce a form of distribution shift at test time, often leading to performance degradation. The primary focus has been on severe corruptions that, when applied aggressively, distort useful signals necessary for accurate semantic predictions. We take a different perspective by analyzing parameters of the image acquisition process and transformations that may be subtle or even imperceptible to the human eye. We find that such parameters are systematically encoded in the learned visual representations and can be easily recovered. More strikingly, their presence can have a profound impact, either positively or negatively, on semantic predictions. This effect depends on whether there is a strong correlation or anti-correlation between semantic labels and these acquisition-based or processing-based labels. Our code and data are available at: https://github.com/ryan-caesar-ramos/visual-encoder-traces
翻译:先前工作分析了视觉编码器对图像变换与扰动的鲁棒性,特别是在训练过程中未观测到此类变化的情况下。当这种情况发生时,会在测试阶段引入一种分布偏移,常导致性能下降。现有研究主要聚焦于严重扰动——当这些扰动被激进施加时,会扭曲准确语义预测所必需的有用信号。我们采取不同视角,分析图像采集过程的参数以及那些可能对人类视觉而言细微甚至不可感知的变换。我们发现,这些参数被系统地编码于习得的视觉表征中,并可被轻易恢复。更引人注目的是,它们的存在会对语义预测产生深远影响——无论是正面还是负面。这种影响取决于语义标签与这些基于采集或基于处理的标签之间是否存在强相关或强反相关。我们的代码与数据见:https://github.com/ryan-caesar-ramos/visual-encoder-traces