We address prevailing challenges of the brain-powered research, departing from the observation that the literature hardly recover accurate spatial information and require subject-specific models. To address these challenges, we propose UMBRAE, a unified multimodal decoding of brain signals. First, to extract instance-level conceptual and spatial details from neural signals, we introduce an efficient universal brain encoder for multimodal-brain alignment and recover object descriptions at multiple levels of granularity from subsequent multimodal large language model (MLLM). Second, we introduce a cross-subject training strategy mapping subject-specific features to a common feature space. This allows a model to be trained on multiple subjects without extra resources, even yielding superior results compared to subject-specific models. Further, we demonstrate this supports weakly-supervised adaptation to new subjects, with only a fraction of the total training data. Experiments demonstrate that UMBRAE not only achieves superior results in the newly introduced tasks but also outperforms methods in well established tasks. To assess our method, we construct and share with the community a comprehensive brain understanding benchmark BrainHub. Our code and benchmark are available at https://weihaox.github.io/UMBRAE.
翻译:我们针对脑科学研究中的当前挑战,从文献中难以恢复精确空间信息且需要个体专属模型这一观察出发。为应对这些挑战,我们提出UMBRAE——一种脑信号的统一多模态解码方法。首先,为从神经信号中提取实例级概念与空间细节,我们引入高效通用脑编码器实现多模态-脑对齐,并从后续多模态大语言模型(MLLM)中恢复多粒度级别的物体描述。其次,我们提出跨被试训练策略,将个体特异性特征映射至公共特征空间。这使得模型无需额外资源即可在多被试数据上训练,甚至获得优于个体专属模型的效果。进一步,我们证明该策略支持对新被试的弱监督适配,仅需总训练数据的一小部分。实验表明,UMBRAE不仅在新引入任务中取得卓越结果,在成熟任务上也优于现有方法。为评估模型性能,我们构建并向社区共享全面的脑理解基准BrainHub。代码与基准数据见https://weihaox.github.io/UMBRAE。