Modern vision pipelines increasingly rely on pretrained image encoders whose representations are reused across tasks and models, yet these representations are often overcomplete and model-specific. We propose a simple, training-free method to improve the efficiency of image representations via a post-hoc canonical correlation analysis (CCA) operator. By leveraging the shared structure between representations produced by two pre-trained image encoders, our method finds linear projections that serve as a principled form of representation selection and dimensionality reduction, retaining shared semantic content while discarding redundant dimensions. Unlike standard dimensionality reduction techniques such as PCA, which operate on a single embedding space, our approach leverages cross-model agreement to guide representation distillation and refinement. The technique allows representations to be reduced by more than 75% in dimensionality with improved downstream performance, or enhanced at fixed dimensionality via post-hoc representation transfer from larger or fine-tuned models. Empirical results on ImageNet-1k, CIFAR-100, MNIST, and additional benchmarks show consistent improvements over both baseline and PCA-projected representations, with accuracy gains of up to 12.6%.
翻译:现代视觉处理流程日益依赖预训练图像编码器,其表示在不同任务和模型间重复使用,但这些表示往往过度完备且具有模型特异性。我们提出一种无需训练的简单方法,通过后验典型相关分析算子提升图像表示的效率。该方法利用两个预训练图像编码器产生表示之间的共享结构,找到作为原则性表示选择和降维形式的线性投影,在保留共享语义内容的同时丢弃冗余维度。与PCA等仅在单一嵌入空间上运行的标准降维技术不同,我们的方法利用跨模型一致性来引导表示蒸馏与优化。该技术可将表示维度降低75%以上并提升下游性能,或通过从更大/微调模型进行后验表示迁移来在固定维度下增强表示。在ImageNet-1k、CIFAR-100、MNIST及其他基准上的实验结果表明,其相较于基线及PCA投影表示均取得一致性改进,准确率提升最高达12.6%。