Determining the similarities and differences between humans and artificial intelligence is an important goal both in machine learning and cognitive neuroscience. However, similarities in representations only inform us about the degree of alignment, not the factors that determine it. Drawing upon recent developments in cognitive science, we propose a generic framework for yielding comparable representations in humans and deep neural networks (DNN). Applying this framework to humans and a DNN model of natural images revealed a low-dimensional DNN embedding of both visual and semantic dimensions. In contrast to humans, DNNs exhibited a clear dominance of visual over semantic features, indicating divergent strategies for representing images. While in-silico experiments showed seemingly-consistent interpretability of DNN dimensions, a direct comparison between human and DNN representations revealed substantial differences in how they process images. By making representations directly comparable, our results reveal important challenges for representational alignment, offering a means for improving their comparability.
翻译:确定人类与人工智能之间的相似性与差异性是机器学习和认知神经科学领域的重要目标。然而,表征层面的相似性仅能反映对齐程度,无法揭示决定对齐的关键因素。借鉴认知科学的最新进展,我们提出一个通用框架,用于生成人类与深度神经网络(DNN)之间可比较的表征。将该框架应用于人类及自然图像DNN模型的研究中,我们发现了一个同时编码视觉与语义维度的低维DNN嵌入空间。与人类不同,DNN表现出视觉特征明显优于语义特征的倾向,这揭示了二者在图像表征策略上的本质差异。尽管计算机模拟实验显示DNN维度具有表面可解释性,但人类与DNN表征的直接对比仍暴露出二者在图像处理机制上的显著区别。通过实现表征的直接可比性,本研究揭示了表征对齐面临的重要挑战,并为提升其可比性提供了方法论支持。