Bayesian Image-on-Scalar Regression (ISR) offers significant advantages for neuroimaging data analysis, including flexibility and the ability to quantify uncertainty. However, its application to large-scale imaging datasets, such as found in the UK Biobank, is hindered by the computational demands of traditional posterior computation methods, as well as the challenge of individual-specific brain masks that deviate from the common mask typically used in standard ISR approaches. To address these challenges, we introduce a novel Bayesian ISR model that is scalable and accommodates inconsistent brain masks across subjects in large scale imaging studies. Our model leverages Gaussian process priors and integrates salience area indicators to facilitate ISR. We develop a cutting-edge scalable posterior computation algorithm that employs stochastic gradient Langevin dynamics coupled with memory mapping techniques, ensuring that computation time scales linearly with subsample size and memory usage is constrained only by the batch size. Our approach uniquely enables direct spatial posterior inferences on brain activation regions. The efficacy of our method is demonstrated through simulations and analysis of the UK Biobank task fMRI data, encompassing 8411 subjects and over 120,000 voxels per image, showing that it can achieve a speed increase of 4 to 11 times and enhance statistical power by 8% to 18% compared to traditional Gibbs sampling with zero-imputation in various simulation scenarios.
翻译:贝叶斯图像-标量回归(ISR)在神经影像数据分析中具有显著优势,包括灵活性和不确定性量化能力。然而,其在UK Biobank等大规模影像数据集中的应用,受到传统后验计算方法的高计算需求以及个体特异性脑掩膜与标准ISR方法中常用公共掩膜存在偏差的挑战。为解决这些问题,我们提出了一种新颖的贝叶斯ISR模型,该模型具有可扩展性,且能适应大规模影像研究中跨受试者的不一致脑掩膜。该模型利用高斯过程先验,并结合显著性区域指标以促进ISR分析。我们开发了一种前沿的可扩展后验计算算法,该算法结合随机梯度Langevin动力学与内存映射技术,确保计算时间与子样本规模呈线性关系,且内存使用仅受批量大小限制。我们的方法独特地支持脑激活区域的直接空间后验推断。通过模拟实验及对UK Biobank任务态fMRI数据(涵盖8411名受试者,每幅图像超过12万个体素)的分析验证了方法的有效性,结果表明,与采用零填充的传统吉布斯采样相比,该方法在不同模拟场景下可实现4至11倍的速度提升,并增强8%至18%的统计效能。