Recent research has seen many behavioral comparisons between humans and deep neural networks (DNNs) in the domain of image classification. Often, comparison studies focus on the end-result of the learning process by measuring and comparing the similarities in the representations of object categories once they have been formed. However, the process of how these representations emerge -- that is, the behavioral changes and intermediate stages observed during the acquisition -- is less often directly and empirically compared. Here we report a detailed investigation of the learning dynamics in human observers and various classic and state-of-the-art DNNs. We develop a constrained supervised learning environment to align learning-relevant conditions such as starting point, input modality, available input data and the feedback provided. Across the whole learning process we evaluate and compare how well learned representations can be generalized to previously unseen test data. Comparisons across the entire learning process indicate that DNNs demonstrate a level of data efficiency comparable to human learners, challenging some prevailing assumptions in the field. However, our results also reveal representational differences: while DNNs' learning is characterized by a pronounced generalisation lag, humans appear to immediately acquire generalizable representations without a preliminary phase of learning training set-specific information that is only later transferred to novel data.
翻译:近期研究在图像分类领域进行了大量人类与深度神经网络(DNNs)的行为比较。比较研究通常聚焦于学习过程的最终结果,通过测量和比较对象类别表征形成后的相似性。然而,这些表征如何形成的过程——即在习得过程中观察到的行为变化和中间阶段——较少被直接且实证地比较。本文报告了对人类观察者与多种经典及前沿DNNs学习动态的详细研究。我们构建了一个受约束的监督学习环境,以对齐学习相关条件,如起始点、输入模态、可用输入数据及所提供反馈。在整个学习过程中,我们评估并比较习得表征对先前未见测试数据的泛化能力。跨全过程比较表明,DNNs展现出与人类学习者相当的数据效率水平,这对该领域某些主流假设提出了挑战。然而,我们的结果也揭示了表征差异:DNNs的学习呈现明显的泛化滞后特征,而人类似乎能直接获得可泛化表征,无需经历先学习训练集特定信息再迁移至新数据的预备阶段。