Bayesian Image-on-Scalar Regression (ISR) offers significant advantages for neuroimaging data analysis, including flexibility and the ability to quantify uncertainty. However, its application to large-scale imaging datasets, such as found in the UK Biobank, is hindered by the computational demands of traditional posterior computation methods, as well as the challenge of individual-specific brain masks that deviate from the common mask typically used in standard ISR approaches. To address these challenges, we introduce a novel Bayesian ISR model that is scalable and accommodates inconsistent brain masks across subjects in large-scale imaging studies. Our model leverages Gaussian process priors and integrates salience area indicators to facilitate ISR. We develop a cutting-edge scalable posterior computation algorithm that employs stochastic gradient Langevin dynamics coupled with memory mapping techniques, ensuring that computation time scales linearly with subsample size and memory usage is constrained only by the batch size. Our approach uniquely enables direct spatial posterior inferences on brain activation regions. The efficacy of our method is demonstrated through simulations and analysis of the UK Biobank task fMRI data, encompassing 38,639 subjects and over 120,000 voxels per image, showing that it can achieve a speed increase of 4 to 11 times and enhance statistical power by 8% to 18% compared to traditional Gibbs sampling with zero-imputation in various simulation scenarios.
翻译:贝叶斯图像对标量回归(ISR)为神经影像数据分析提供了显著优势,包括灵活性和不确定性量化能力。然而,其在大规模成像数据集(如英国生物银行数据)中的应用受到传统后验计算方法的计算需求以及个体特异性脑掩膜与标准ISR方法通常使用的通用掩膜存在偏差这一挑战的制约。为解决这些挑战,我们提出了一种新颖的贝叶斯ISR模型,该模型具有可扩展性,并能适应大规模成像研究中受试者间不一致的脑掩膜。我们的模型利用高斯过程先验并整合显著区域指示器以促进ISR。我们开发了一种前沿的可扩展后验计算算法,该算法采用随机梯度朗之万动力学并结合内存映射技术,确保计算时间随子样本大小线性增长,且内存使用仅受批次大小限制。我们的方法独特地实现了对大脑激活区域的直接空间后验推断。通过模拟实验和对英国生物银行任务fMRI数据的分析(涵盖38,639名受试者,每幅图像超过120,000个体素),我们证明了该方法的有效性。结果显示,在各种模拟场景中,相较于采用零插补的传统吉布斯采样方法,我们的方法可实现4至11倍的速度提升,并将统计功效提高8%至18%。