Meta-Learning an In-Context Transformer Model of Human Higher Visual Cortex

Understanding functional representations within higher visual cortex is a fundamental question in computational neuroscience. While artificial neural networks pretrained on large-scale datasets exhibit striking representational alignment with human neural responses, learning image-computable models of visual cortex relies on individual-level, large-scale fMRI datasets. The necessity for expensive, time-intensive, and often impractical data acquisition limits the generalizability of encoders to new subjects and stimuli. BraInCoRL uses in-context learning to predict voxelwise neural responses from few-shot examples without any additional finetuning for novel subjects and stimuli. We leverage a transformer architecture that can flexibly condition on a variable number of in-context image stimuli, learning an inductive bias over multiple subjects. During training, we explicitly optimize the model for in-context learning. By jointly conditioning on image features and voxel activations, our model learns to directly generate better performing voxelwise models of higher visual cortex. We demonstrate that BraInCoRL consistently outperforms existing voxelwise encoder designs in a low-data regime when evaluated on entirely novel images, while also exhibiting strong test-time scaling behavior. The model also generalizes to an entirely new visual fMRI dataset, which uses different subjects and fMRI data acquisition parameters. Further, BraInCoRL facilitates better interpretability of neural signals in higher visual cortex by attending to semantically relevant stimuli. Finally, we show that our framework enables interpretable mappings from natural language queries to voxel selectivity.

翻译：理解高级视觉皮层内的功能表征是计算神经科学中的一个基本问题。尽管在大规模数据集上预训练的人工神经网络与人类神经响应表现出显著的表征对齐，但学习视觉皮层的图像可计算模型依赖于个体层面的大规模fMRI数据集。昂贵、耗时且通常不切实际的数据采集需求限制了编码器对新受试者和刺激的泛化能力。BraInCoRL利用上下文学习，通过少量样本预测体素级神经响应，无需针对新受试者和刺激进行额外微调。我们采用一种Transformer架构，能够灵活地适应可变数量的上下文图像刺激，从而学习跨多个受试者的归纳偏置。在训练过程中，我们明确优化模型以进行上下文学习。通过联合调节图像特征和体素激活，我们的模型学会直接生成性能更优的高级视觉皮层体素级模型。我们证明，在完全新颖图像上评估时，BraInCoRL在低数据量情况下始终优于现有体素级编码器设计，同时展现出强大的测试时缩放行为。该模型还能泛化到全新的视觉fMRI数据集，该数据集使用不同的受试者和fMRI数据采集参数。此外，BraInCoRL通过关注语义相关的刺激，促进了对高级视觉皮层神经信号的更好可解释性。最后，我们展示了我们的框架能够实现从自然语言查询到体素选择性的可解释映射。