Should we care whether AI systems have representations of the world that are similar to those of humans? We provide an information-theoretic analysis that suggests that there should be a U-shaped relationship between the degree of representational alignment with humans and performance on few-shot learning tasks. We confirm this prediction empirically, finding such a relationship in an analysis of the performance of 491 computer vision models. We also show that highly-aligned models are more robust to both natural adversarial attacks and domain shifts. Our results suggest that human-alignment is often a sufficient, but not necessary, condition for models to make effective use of limited data, be robust, and generalize well.
翻译:我们是否应该关注AI系统是否拥有与人类相似的世界表征?通过信息论分析,我们提出:与人类表征对齐程度与少样本学习任务性能之间应存在U型关系。基于491个计算机视觉模型的性能分析,我们实证验证了这一预测。同时发现,高度对齐的模型对自然对抗攻击和领域迁移均表现出更强的鲁棒性。研究结果表明,人类对齐通常是模型有效利用有限数据、保持鲁棒性并实现良好泛化的充分非必要条件。