Scaling Laws for Task-Optimized Models of the Primate Visual Ventral Stream

from arxiv, 9 pages for the main paper, 20 pages in total. 6 main figures and 10 supplementary figures. Code, model weights, and benchmark results can be accessed at https://github.com/epflneuroailab/scaling-primate-vvs

When trained on large-scale object classification datasets, certain artificial neural network models begin to approximate core object recognition (COR) behaviors and neural response patterns in the primate visual ventral stream (VVS). While recent machine learning advances suggest that scaling model size, dataset size, and compute resources improve task performance, the impact of scaling on brain alignment remains unclear. In this study, we explore scaling laws for modeling the primate VVS by systematically evaluating over 600 models trained under controlled conditions on benchmarks spanning V1, V2, V4, IT and COR behaviors. We observe that while behavioral alignment continues to scale with larger models, neural alignment saturates. This observation remains true across model architectures and training datasets, even though models with stronger inductive bias and datasets with higher-quality images are more compute-efficient. Increased scaling is especially beneficial for higher-level visual areas, where small models trained on few samples exhibit only poor alignment. Finally, we develop a scaling recipe, indicating that a greater proportion of compute should be allocated to data samples over model size. Our results suggest that while scaling alone might suffice for alignment with human core object recognition behavior, it will not yield improved models of the brain's visual ventral stream with current architectures and datasets, highlighting the need for novel strategies in building brain-like models.

翻译：当在大规模物体分类数据集上训练时，某些人工神经网络模型开始近似灵长类视觉腹侧流（VVS）中的核心物体识别（COR）行为与神经响应模式。尽管近期机器学习进展表明，扩大模型规模、数据集规模与计算资源可提升任务性能，但缩放对大脑对齐的影响仍不明确。本研究通过系统评估600余个在受控条件下训练的模型（测试基准涵盖V1、V2、V4、IT区域及COR行为），探索了建模灵长类VVS的缩放定律。我们发现：行为对齐随模型规模扩大持续提升，而神经对齐则趋于饱和。这一现象在不同模型架构与训练数据集中普遍存在，尽管具有更强归纳偏置的模型和更高质量图像的数据集具备更高的计算效率。扩大规模对高级视觉区域尤其有益——在少量样本上训练的小型模型仅表现出微弱对齐。最后，我们提出了缩放方案，指出应将更高比例的计算资源分配给数据样本而非模型规模。研究结果表明：虽然单纯扩大规模可能足以实现与人类核心物体识别行为的对齐，但在当前架构与数据集下，这无法产生更优的大脑视觉腹侧流模型，这凸显了构建类脑模型需要新策略的必要性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日