Multi-task Collaborative Pre-training and Individual-adaptive-tokens Fine-tuning: A Unified Framework for Brain Representation Learning

Structural magnetic resonance imaging (sMRI) provides accurate estimates of the brain's structural organization and learning invariant brain representations from sMRI is an enduring issue in neuroscience. Previous deep representation learning models ignore the fact that the brain, as the core of human cognitive activity, is distinct from other organs whose primary attribute is anatomy. Therefore, capturing the semantic structure that dominates interindividual cognitive variability is key to accurately representing the brain. Given that this high-level semantic information is subtle, distributed, and interdependently latent in the brain structure, sMRI-based models need to capture fine-grained details and understand how they relate to the overall global structure. However, existing models are optimized by simple objectives, making features collapse into homogeneity and worsening simultaneous representation of fine-grained information and holistic semantics, causing a lack of biological plausibility and interpretation of cognition. Here, we propose MCIAT, a unified framework that combines Multi-task Collaborative pre-training and Individual-Adaptive-Tokens fine-tuning. Specifically, we first synthesize restorative learning, age prediction auxiliary learning and adversarial learning as a joint proxy task for deep semantic representation learning. Then, a mutual-attention-based token selection method is proposed to highlight discriminative features. The proposed MCIAT achieves state-of-the-art diagnosis performance on the ADHD-200 dataset compared with several sMRI-based approaches and shows superior generalization on the MCIC and OASIS datasets. Moreover, we studied 12 behavioral tasks and found significant associations between cognitive functions and MCIAT-established representations, which verifies the interpretability of our proposed framework.

翻译：结构磁共振成像（sMRI）可提供大脑结构组织的精确估计，而从sMRI中学习不变脑表征是神经科学中一个持续存在的问题。以往的深度表征学习模型忽略了大脑作为人类认知活动的核心，其与主要以解剖学为主要属性的其他器官存在本质区别这一事实。因此，捕捉主导个体间认知变异性的语义结构是准确表征大脑的关键。鉴于此类高层语义信息在大脑结构中具有微妙、分布且相互依赖的潜在特性，基于sMRI的模型需要捕捉精细细节并理解其与整体全局结构的关系。然而，现有模型通过简单目标进行优化，导致特征坍缩为同质化，并加剧了细粒度信息与整体语义的同步表征困难，进而缺乏生物学合理性与认知可解释性。为此，我们提出MCIAT——一种融合多任务协同预训练与个体自适应令牌微调的统一框架。具体而言，我们首先将复原学习、年龄预测辅助学习与对抗学习整合为联合代理任务，用于深度语义表征学习；随后提出一种基于互注意力机制的令牌选择方法以突出判别性特征。与多种基于sMRI的方法相比，所提MCIAT在ADHD-200数据集上实现了最先进的诊断性能，并在MCIC和OASIS数据集上展现出优越的泛化能力。此外，我们研究了12项行为任务，发现认知功能与MCIAT建立的表征之间存在显著关联，这验证了所提框架的可解释性。