Embeddings produced by pre-trained deep neural networks (DNNs) are widely used; however, their efficacy for downstream tasks can vary widely. We study the factors influencing transferability and out-of-distribution (OOD) generalization of pre-trained DNN embeddings through the lens of the tunnel effect hypothesis, which is closely related to intermediate neural collapse. This hypothesis suggests that deeper DNN layers compress representations and hinder OOD generalization. Contrary to earlier work, our experiments show this is not a universal phenomenon. We comprehensively investigate the impact of DNN architecture, training data, image resolution, and augmentations on transferability. We identify that training with high-resolution datasets containing many classes greatly reduces representation compression and improves transferability. Our results emphasize the danger of generalizing findings from toy datasets to broader contexts.
翻译:预训练深度神经网络(DNN)生成的嵌入表示被广泛应用,但其在下游任务中的效能差异显著。我们通过隧道效应假说(该假说与中间神经坍缩密切相关)的视角,研究影响预训练DNN嵌入可迁移性与分布外(OOD)泛化性能的因素。该假说认为较深的DNN层会压缩表征并阻碍OOD泛化。与早期研究相反,我们的实验表明这并非普遍现象。我们系统探究了DNN架构、训练数据、图像分辨率及数据增强对可迁移性的影响,发现使用包含多类别的高分辨率数据集进行训练能显著减轻表征压缩并提升可迁移性。本研究结果警示了将基于玩具数据集的结论推广至更广泛情境的风险。