How do we know if two systems - biological or artificial - process information in a similar way? Similarity measures such as linear regression, Centered Kernel Alignment (CKA), Normalized Bures Similarity (NBS), and angular Procrustes distance, are often used to quantify this similarity. However, it is currently unclear what drives high similarity scores and even what constitutes a "good" score. Here, we introduce a novel tool to investigate these questions by differentiating through similarity measures to directly maximize the score. Surprisingly, we find that high similarity scores do not guarantee encoding task-relevant information in a manner consistent with neural data; and this is particularly acute for CKA and even some variations of cross-validated and regularized linear regression. We find no consistent threshold for a good similarity score - it depends on both the measure and the dataset. In addition, synthetic datasets optimized to maximize similarity scores initially learn the highest variance principal component of the target dataset, but some methods like angular Procrustes capture lower variance dimensions much earlier than methods like CKA. To shed light on this, we mathematically derive the sensitivity of CKA, angular Procrustes, and NBS to the variance of principal component dimensions, and explain the emphasis CKA places on high variance components. Finally, by jointly optimizing multiple similarity measures, we characterize their allowable ranges and reveal that some similarity measures are more constraining than others. While current measures offer a seemingly straightforward way to quantify the similarity between neural systems, our work underscores the need for careful interpretation. We hope the tools we developed will be used by practitioners to better understand current and future similarity measures.
翻译:我们如何判断两个系统——无论是生物系统还是人工系统——是否以相似的方式处理信息?线性回归、中心核对齐(CKA)、归一化布雷斯相似性(NBS)和角度普洛克鲁斯距离等相似性度量常被用于量化这种相似性。然而,目前尚不清楚是什么驱动了高相似性得分,甚至什么构成了“好”的得分。在此,我们引入一种新工具来研究这些问题:通过对相似性度量进行微分以直接最大化得分。令人惊讶的是,我们发现高相似性得分并不能保证以与神经数据一致的方式编码任务相关信息;这对于CKA甚至某些交叉验证和正则化线性回归的变体尤为突出。我们发现不存在一个适用于所有情况的“好”相似性得分阈值——它既取决于度量方法,也取决于数据集。此外,为最大化相似性得分而优化的合成数据集最初会学习目标数据集中方差最高的主成分,但像角度普洛克鲁斯这样的方法比CKA等方法更早地捕获低方差维度。为了阐明这一点,我们从数学上推导了CKA、角度普洛克鲁斯和NBS对主成分维度方差的敏感性,并解释了CKA强调高方差成分的原因。最后,通过联合优化多个相似性度量,我们刻画了它们的允许范围,并揭示出某些相似性度量比其他度量更具约束性。虽然当前的度量方法提供了一种看似直接的方式来量化神经系统之间的相似性,但我们的工作强调需要谨慎解读。我们希望我们所开发的工具能被实践者用于更好地理解当前及未来的相似性度量。