Getting aligned on representational alignment

Ilia Sucholutsky,Lukas Muttenthaler,Adrian Weller,Andi Peng,Andreea Bobu,Been Kim,Bradley C. Love,Erin Grant,Iris Groen,Jascha Achterberg,Joshua B. Tenenbaum,Katherine M. Collins,Katherine L. Hermann,Kerem Oktar,Klaus Greff,Martin N. Hebart,Nori Jacoby,Qiuyi Zhang,Raja Marjieh,Robert Geirhos,Sherol Chen,Simon Kornblith,Sunayana Rane,Talia Konkle,Thomas P. O'Connell,Thomas Unterthiner,Andrew K. Lampinen,Klaus-Robert Müller,Mariya Toneva,Thomas L. Griffiths

from arxiv, Working paper, changes to be made in upcoming revisions

Biological and artificial information processing systems form representations that they can use to categorize, reason, plan, navigate, and make decisions. How can we measure the extent to which the representations formed by these diverse systems agree? Do similarities in representations then translate into similar behavior? How can a system's representations be modified to better match those of another system? These questions pertaining to the study of representational alignment are at the heart of some of the most active research areas in cognitive science, neuroscience, and machine learning. For example, cognitive scientists measure the representational alignment of multiple individuals to identify shared cognitive priors, neuroscientists align fMRI responses from multiple individuals into a shared representational space for group-level analyses, and ML researchers distill knowledge from teacher models into student models by increasing their alignment. Unfortunately, there is limited knowledge transfer between research communities interested in representational alignment, so progress in one field often ends up being rediscovered independently in another. Thus, greater cross-field communication would be advantageous. To improve communication between these fields, we propose a unifying framework that can serve as a common language between researchers studying representational alignment. We survey the literature from all three fields and demonstrate how prior work fits into this framework. Finally, we lay out open problems in representational alignment where progress can benefit all three of these fields. We hope that our work can catalyze cross-disciplinary collaboration and accelerate progress for all communities studying and developing information processing systems. We note that this is a working paper and encourage readers to reach out with their suggestions for future revisions.

翻译：生物和人工信息处理系统会形成表征，用于分类、推理、规划、导航和决策。我们如何衡量这些多样系统形成的表征之间的一致性程度？表征的相似性是否会转化为相似的行为？如何修改一个系统的表征以更好地匹配另一个系统的表征？这些问题涉及表征对齐的研究，是认知科学、神经科学和机器学习中最活跃的研究领域的核心。例如，认知科学家测量多个个体的表征对齐，以识别共享的认知先验；神经科学家将多个个体的fMRI响应对齐到共享的表征空间，以进行群体层面的分析；机器学习研究者通过提高教师模型与学生模型之间的对齐度，将知识从教师模型提炼到学生模型。不幸的是，对表征对齐感兴趣的研究社区之间知识传递有限，因此一个领域的进展往往在另一个领域被独立重新发现。因此，加强跨领域交流将是有益的。为了改善这些领域之间的沟通，我们提出了一个统一的框架，可作为研究表征对齐的研究者之间的共同语言。我们调查了这三个领域的文献，并展示了先前工作如何融入该框架。最后，我们提出了表征对齐中尚未解决的开放问题，其进展可使所有三个领域受益。我们希望我们的工作能促进跨学科合作，并加速所有研究和开发信息处理系统的社区的进步。我们注意到这是一篇工作论文，并鼓励读者提供对未来修订的建议。