Linearly transforming stimulus representations of deep neural networks yields high-performing models of behavioral and neural responses to complex stimuli. But does the test accuracy of such predictions identify genuine representational alignment? We addressed this question through a large-scale model-recovery study. Twenty diverse vision models were linearly aligned to 4.5 million behavioral judgments from the THINGS odd-one-out dataset and calibrated to reproduce human response variability. For each model in turn, we sampled synthetic responses from its probabilistic predictions, fitted all candidate models to the synthetic data, and tested whether the data-generating model would re-emerge as the best predictor of the simulated data. Model recovery accuracy improved with training-set size but plateaued below 80%, even at millions of simulated trials. Regression analyses linked misidentification primarily to shifts in representational geometry induced by the linear transformation, as well as to the effective dimensionality of the transformed features. These findings demonstrate that, even with massive behavioral data, overly flexible alignment metrics may fail to guide us toward artificial representations that are genuinely more human-aligned. Model comparison experiments must be designed to balance the trade-off between predictive accuracy and identifiability-ensuring that the best-fitting model is also the right one.
翻译:对深度神经网络的刺激表征进行线性变换,可以构建出对复杂刺激的行为与神经反应的高性能预测模型。然而,此类预测的测试精度是否能真正识别出表征对齐?我们通过一项大规模模型恢复研究探讨了该问题。我们将20个不同的视觉模型与来自THINGS“三选一”数据集的450万条行为判断进行线性对齐,并校准模型以复现人类反应的变异性。对于每个模型,我们依次从其概率预测中采样合成反应,将所有候选模型拟合到合成数据上,并检验数据生成模型是否能作为模拟数据的最佳预测器重新显现。模型恢复精度随训练集规模增大而提高,但即使在数百万次模拟试次下,其精度仍稳定在80%以下。回归分析表明,误识别主要源于线性变换引起的表征几何结构偏移,以及变换后特征的有效维度。这些发现证明,即使拥有海量行为数据,过于灵活的对齐度量也可能无法引导我们找到真正更接近人类表征的人工表征。模型比较实验的设计必须权衡预测精度与可识别性之间的平衡——确保最佳拟合模型同时也是正确的模型。