Despite years of research and the dramatic scaling of artificial intelligence (AI) systems, a striking misalignment between artificial and human vision persists. Contrary to humans, AI relies heavily on texture-features rather than shape information, lacks robustness to image distortions, remains highly vulnerable to adversarial attacks, and struggles to recognise simple abstract shapes within complex backgrounds. To close this gap, here we take inspiration from how human vision develops from early infancy into adulthood. We quantified visual maturation by synthesising decades of research into a novel developmental visual diet (DVD) for AI vision. Guiding AI systems through this human-inspired curriculum, which considers the development of visual acuity, contrast sensitivity, and colour, produces models that better align with human behaviour on every hallmark of robust vision tested, yielding the strongest reported reliance on shape information to date, abstract shape recognition beyond the state of the art, and higher resilience to image corruptions and adversarial attacks. Our results thus demonstrate that robust AI vision can be achieved by guiding how a model learns, not merely how much it learns, offering a resource-efficient route toward safer and more human-like artificial visual systems.
翻译:尽管经过多年研究以及人工智能(AI)系统规模的急剧扩大,人工视觉与人类视觉之间仍存在显著的不匹配。与人类不同,AI 严重依赖纹理特征而非形状信息,对图像畸变缺乏鲁棒性,极易受到对抗攻击,且在复杂背景中难以识别简单的抽象形状。为弥合这一差距,本文从人类视觉从婴儿早期到成年的发展过程中汲取灵感。我们通过综合数十年研究成果,为 AI 视觉量化了视觉成熟度,并构建了一种新颖的发展视觉饮食(DVD)。引导 AI 系统通过这种受人类启发的课程(该课程考虑了视觉敏锐度、对比敏感度和色彩的发展),所产生的模型在所有测试的鲁棒视觉特征上都与人类行为更加一致,实现了迄今为止报道的最强的形状信息依赖、超越现有水平的抽象形状识别能力,以及对图像损坏和对抗攻击更高的抵抗力。因此,我们的结果表明,通过引导模型如何学习(而不仅仅是学习多少),可以实现鲁棒的 AI 视觉,这为构建更安全、更类人的人工视觉系统提供了一条资源高效的途径。