When trained on large-scale object classification datasets, certain artificial neural network models begin to approximate core object recognition (COR) behaviors and neural response patterns in the primate visual ventral stream (VVS). While recent machine learning advances suggest that scaling model size, dataset size, and compute resources improve task performance, the impact of scaling on brain alignment remains unclear. In this study, we explore scaling laws for modeling the primate VVS by systematically evaluating over 600 models trained under controlled conditions on benchmarks spanning V1, V2, V4, IT and COR behaviors. We observe that while behavioral alignment continues to scale with larger models, neural alignment saturates. This observation remains true across model architectures and training datasets, even though models with stronger inductive bias and datasets with higher-quality images are more compute-efficient. Increased scaling is especially beneficial for higher-level visual areas, where small models trained on few samples exhibit only poor alignment. Finally, we develop a scaling recipe, indicating that a greater proportion of compute should be allocated to data samples over model size. Our results suggest that while scaling alone might suffice for alignment with human core object recognition behavior, it will not yield improved models of the brain's visual ventral stream with current architectures and datasets, highlighting the need for novel strategies in building brain-like models.
翻译:当在大规模物体分类数据集上训练时,某些人工神经网络模型开始近似灵长类视觉腹侧流(VVS)中的核心物体识别(COR)行为与神经响应模式。尽管近期机器学习进展表明,扩大模型规模、数据集规模与计算资源可提升任务性能,但缩放对大脑对齐的影响仍不明确。本研究通过系统评估600余个在受控条件下训练的模型(测试基准涵盖V1、V2、V4、IT区域及COR行为),探索了建模灵长类VVS的缩放定律。我们发现:行为对齐随模型规模扩大持续提升,而神经对齐则趋于饱和。这一现象在不同模型架构与训练数据集中普遍存在,尽管具有更强归纳偏置的模型和更高质量图像的数据集具备更高的计算效率。扩大规模对高级视觉区域尤其有益——在少量样本上训练的小型模型仅表现出微弱对齐。最后,我们提出了缩放方案,指出应将更高比例的计算资源分配给数据样本而非模型规模。研究结果表明:虽然单纯扩大规模可能足以实现与人类核心物体识别行为的对齐,但在当前架构与数据集下,这无法产生更优的大脑视觉腹侧流模型,这凸显了构建类脑模型需要新策略的必要性。