What metrics should guide the development of more realistic models of the brain? One proposal is to quantify the similarity between models and brains using methods such as linear regression, Centered Kernel Alignment (CKA), and angular Procrustes distance. To better understand the limitations of these similarity measures we analyze neural activity recorded in five experiments on nonhuman primates, and optimize synthetic datasets to become more similar to these neural recordings. How similar can these synthetic datasets be to neural activity while failing to encode task relevant variables? We find that some measures like linear regression and CKA, differ from angular Procrustes, and yield high similarity scores even when task relevant variables cannot be linearly decoded from the synthetic datasets. Synthetic datasets optimized to maximize similarity scores initially learn the first principal component of the target dataset, but angular Procrustes captures higher variance dimensions much earlier than methods like linear regression and CKA. We show in both theory and simulations how these scores change when different principal components are perturbed. And finally, we jointly optimize multiple similarity scores to find their allowed ranges, and show that a high angular Procrustes similarity, for example, implies a high CKA score, but not the converse.
翻译:何种指标应指导更真实大脑模型的开发?一种建议是使用线性回归、中心核对齐(CKA)和角度普氏距离等方法量化模型与大脑之间的相似性。为更好地理解这些相似性度量的局限性,我们分析了在非人灵长类动物五项实验中记录的神经活动,并优化合成数据集以使其更接近这些神经记录。这些合成数据集在无法编码任务相关变量的情况下,能与神经活动达到多高的相似度?我们发现,线性回归和CKA等度量与角度普氏距离存在差异,即使任务相关变量无法从合成数据集中线性解码,它们仍能产生较高的相似性分数。为最大化相似性分数而优化的合成数据集最初会学习目标数据集的第一主成分,但角度普氏距离捕获高方差维度的速度远早于线性回归和CKA等方法。我们通过理论和仿真展示了当不同主成分受到扰动时这些分数的变化情况。最后,我们联合优化多个相似性分数以确定其允许范围,并证明例如较高的角度普氏相似性意味着较高的CKA分数,但反之则不成立。