Recent research has seen many behavioral comparisons between humans and deep neural networks (DNNs) in the domain of image classification. Often, comparison studies focus on the end-result of the learning process by measuring and comparing the similarities in the representations of object categories once they have been formed. However, the process of how these representations emerge -- that is, the behavioral changes and intermediate stages observed during the acquisition -- is less often directly and empirically compared. Here we report a detailed investigation of how transferable representations are acquired in human observers and various classic and state-of-the-art DNNs. We develop a constrained supervised learning environment in which we align learning-relevant parameters such as starting point, input modality, available input data and the feedback provided. Across the whole learning process we evaluate and compare how well learned representations can be generalized to previously unseen test data. Our findings indicate that in terms of absolute classification performance DNNs demonstrate a level of data efficiency comparable to -- and sometimes even exceeding that -- of human learners, challenging some prevailing assumptions in the field. However, comparisons across the entire learning process reveal significant representational differences: while DNNs' learning is characterized by a pronounced generalisation lag, humans appear to immediately acquire generalizable representations without a preliminary phase of learning training set-specific information that is only later transferred to novel data.
翻译:近期研究在图像分类领域对人与深度神经网络(DNNs)进行了大量行为比较。这些比较研究通常侧重于学习过程的最终结果,即测量和比较对象类别表征形成后的相似性。然而,对于这些表征如何涌现——即学习过程中观察到的行为变化与中间阶段——的直接经验性比较仍较为少见。本文系统研究了人类观察者与多种经典及最先进DNNs获取可迁移表征的具体过程。我们构建了一个受约束的监督学习环境,在其中对齐了起始点、输入模态、可用输入数据及反馈等学习相关参数。通过观察整个学习过程,我们评估并比较了所学表征向未见过测试数据的泛化能力。研究发现:在绝对分类性能方面,DNNs展现出与人类学习者可媲美甚至超越的数据效率,这挑战了该领域的一些传统假设。但贯穿全程的跨阶段比较揭示出显著的表征差异:DNNs的学习过程呈现出明显的泛化滞后现象,而人类似乎能即时获取可泛化表征,无需经历先学习训练集特异性信息、再迁移至新数据的初步阶段。