OmniMouse: Scaling properties of multi-modal, multi-task Brain Models on 150B Neural Tokens

Konstantin F. Willeke,Polina Turishcheva,Alex Gilbert,Goirik Chakrabarty,Hasan A. Bedel,Paul G. Fahey,Yongrong Qiu,Marissa A. Weis,Michaela Vystrčilová,Taliah Muhammad,Lydia Ntanavara,Rachel E. Froebe,Kayla Ponder,Zheng Huan Tan,Emin Orhan,Erick Cobos,Sophia Sanborn,Katrin Franke,Fabian H. Sinz,Alexander S. Ecker,Andreas S. Tolias

from arxiv, Published at ICLR2026

Scaling data and artificial neural networks has transformed AI, driving breakthroughs in language and vision. Whether similar principles apply to modeling brain activity remains unclear. Here we leveraged a dataset of 3.1 million neurons from the visual cortex of 73 mice across 323 sessions, totaling more than 150 billion neural tokens recorded during natural movies, images and parametric stimuli, and behavior. We train multi-modal, multi-task models that support three regimes flexibly at test time: neural prediction, behavioral decoding, neural forecasting, or any combination of the three. OmniMouse achieves state-of-the-art performance, outperforming specialized baselines across nearly all evaluation regimes. We find that performance scales reliably with more data, but gains from increasing model size saturate. This inverts the standard AI scaling story: in language and computer vision, massive datasets make parameter scaling the primary driver of progress, whereas in brain modeling -- even in the mouse visual cortex, a relatively simple system -- models remain data-limited despite vast recordings. The observation of systematic scaling raises the possibility of phase transitions in neural modeling, where larger and richer datasets might unlock qualitatively new capabilities, paralleling the emergent properties seen in large language models. Code available at https://github.com/enigma-brain/omnimouse.

翻译：数据与人工神经网络的扩展已推动人工智能在语言和视觉领域的突破性进展，但类似原则是否适用于脑活动建模仍不明确。本研究基于来自73只小鼠视觉皮层323个实验会话的310万个神经元数据集（总计超过1500亿个神经令牌），记录内容涵盖自然电影、图像、参数化刺激及行为数据。我们训练了支持测试时灵活切换三种模式的多模态多任务模型：神经预测、行为解码、神经预测与行为解码的任意组合。OmniMouse实现了最先进的性能，在几乎所有评估模式下均超越专用基线模型。研究发现，性能随数据量增加可靠扩展，但扩大模型规模带来的增益趋于饱和。这与标准AI扩展规律相反：在语言与计算机视觉领域，大规模数据集使参数扩展成为主要驱动力；而在脑建模中（即使是对相对简单的小鼠视觉皮层系统），尽管记录规模庞大，模型仍受限于数据量。系统性扩展现象的发现提示神经建模中可能存在的相变——更大、更丰富的数据集或可解锁质变新能力，这与大型语言模型中涌现的特性类似。代码详见https://github.com/enigma-brain/omnimouse。