Deep-learning models have enabled performance leaps in analysis of high-dimensional functional MRI (fMRI) data. Yet, many previous methods are suboptimally sensitive for contextual representations across diverse time scales. Here, we present BolT, a blood-oxygen-level-dependent transformer model, for analyzing multi-variate fMRI time series. BolT leverages a cascade of transformer encoders equipped with a novel fused window attention mechanism. Encoding is performed on temporally-overlapped windows within the time series to capture local representations. To integrate information temporally, cross-window attention is computed between base tokens in each window and fringe tokens from neighboring windows. To gradually transition from local to global representations, the extent of window overlap and thereby number of fringe tokens are progressively increased across the cascade. Finally, a novel cross-window regularization is employed to align high-level classification features across the time series. Comprehensive experiments on large-scale public datasets demonstrate the superior performance of BolT against state-of-the-art methods. Furthermore, explanatory analyses to identify landmark time points and regions that contribute most significantly to model decisions corroborate prominent neuroscientific findings in the literature.
翻译:深度学习模型在分析高维功能磁共振成像数据方面取得了性能突破。然而,许多现有方法在不同时间尺度的上下文表征方面存在敏感性不足的问题。本文提出BolT,一种血氧水平依赖的Transformer模型,用于分析多变量fMRI时间序列。BolT采用级联的Transformer编码器结构,并配备新颖的融合窗口注意力机制。编码过程在时间序列中具有时间重叠的窗口上进行,以捕捉局部表征。为在时间上整合信息,每个窗口的基础词元与邻近窗口的边缘词元之间计算跨窗口注意力。为实现从局部表征到全局表征的渐进过渡,级联结构中窗口重叠程度及对应的边缘词元数量逐步增加。最后,采用新颖的跨窗口正则化方法,对齐时间序列中的高级分类特征。在大型公开数据集上的综合实验表明,BolT的性能优于当前最优方法。此外,通过识别对模型决策贡献最大的关键时间点与脑区的解释性分析,验证了文献中的重要神经科学发现。