Understanding how neural activity gives rise to perception is a central challenge in neuroscience. We address the problem of decoding visual information from high-density intracortical recordings in primates, using the THINGS Ventral Stream Spiking Dataset. We systematically evaluate the effects of model architecture, training objectives, and data scaling on decoding performance. Results show that decoding accuracy is mainly driven by modeling temporal dynamics in neural signals, rather than architectural complexity. A simple model combining temporal attention with a shallow MLP achieves up to 70% top-1 image retrieval accuracy, outperforming linear baselines as well as recurrent and convolutional approaches. Scaling analyses reveal predictable diminishing returns with increasing input dimensionality and dataset size. Building on these findings, we design a modular generative decoding pipeline that combines low-resolution latent reconstruction with semantically conditioned diffusion, generating plausible images from 200 ms of brain activity. This framework provides principles for brain-computer interfaces and semantic neural decoding.
翻译:理解神经活动如何产生感知是神经科学的核心挑战。我们利用THINGS腹侧通路尖峰数据集,研究了从灵长类动物高密度皮层内记录中解码视觉信息的问题。我们系统评估了模型架构、训练目标和数据规模对解码性能的影响。结果表明,解码精度主要受神经信号时间动态建模的驱动,而非架构复杂性。一个结合时间注意力与浅层MLP的简单模型实现了高达70%的top-1图像检索准确率,优于线性基线以及循环和卷积方法。缩放分析揭示了随着输入维度和数据集大小的增加,存在可预测的收益递减规律。基于这些发现,我们设计了一个模块化生成式解码流程,将低分辨率潜在重建与语义条件扩散相结合,能够从200毫秒的大脑活动中生成合理的图像。该框架为脑机接口和语义神经解码提供了原理性指导。