SwiFT: Swin 4D fMRI Transformer

Modeling spatiotemporal brain dynamics from high-dimensional data, such as functional Magnetic Resonance Imaging (fMRI), is a formidable task in neuroscience. Existing approaches for fMRI analysis utilize hand-crafted features, but the process of feature extraction risks losing essential information in fMRI scans. To address this challenge, we present SwiFT (Swin 4D fMRI Transformer), a Swin Transformer architecture that can learn brain dynamics directly from fMRI volumes in a memory and computation-efficient manner. SwiFT achieves this by implementing a 4D window multi-head self-attention mechanism and absolute positional embeddings. We evaluate SwiFT using multiple large-scale resting-state fMRI datasets, including the Human Connectome Project (HCP), Adolescent Brain Cognitive Development (ABCD), and UK Biobank (UKB) datasets, to predict sex, age, and cognitive intelligence. Our experimental outcomes reveal that SwiFT consistently outperforms recent state-of-the-art models. Furthermore, by leveraging its end-to-end learning capability, we show that contrastive loss-based self-supervised pre-training of SwiFT can enhance performance on downstream tasks. Additionally, we employ an explainable AI method to identify the brain regions associated with sex classification. To our knowledge, SwiFT is the first Swin Transformer architecture to process dimensional spatiotemporal brain functional data in an end-to-end fashion. Our work holds substantial potential in facilitating scalable learning of functional brain imaging in neuroscience research by reducing the hurdles associated with applying Transformer models to high-dimensional fMRI.

翻译：从高维数据（如功能性磁共振成像，fMRI）中建模时空脑动力学是神经科学领域的一项艰巨任务。现有fMRI分析方法依赖手工设计的特征，但特征提取过程存在丢失fMRI扫描中关键信息的风险。为应对这一挑战，我们提出SwiFT（Swin 4D fMRI Transformer）——一种Swin Transformer架构，能够以内存高效和计算高效的方式直接从fMRI体数据中学习脑动力学。SwiFT通过实现4D窗口多头自注意力机制和绝对位置嵌入来实现这一目标。我们使用多个大规模静息态fMRI数据集（包括人类连接组计划HCP、青少年脑认知发育ABCD和英国生物样本库UKB）评估SwiFT在性别、年龄和认知智力预测任务上的性能。实验结果表明，SwiFT在多项指标上持续优于最新模型。此外，通过利用其端到端学习能力，我们证明基于对比损失的自监督预训练可提升SwiFT在下游任务上的表现。我们还采用可解释AI方法识别与性别分类相关的脑区。据我们所知，SwiFT是首个以端到端方式处理时空脑功能数据的Swin Transformer架构。本研究通过降低Transformer模型应用于高维fMRI数据的门槛，为促进神经科学研究中功能性脑成像的可扩展学习提供了重要潜力。