Environmental epidemiology has traditionally focused on examining health effects of single exposures, more recently with adjustment for co-occurring exposures. Advancements in exposure assessments and statistical tools have enabled a shift towards studying multiple exposures and their combined health impacts. Bayesian Kernel Machine Regression (BKMR) is a popular approach to flexibly estimate the joint and nonlinear effects of multiple exposures. However, BKMR faces computation challenges for large datasets, as inverting the kernel repeatedly in Markov chain Monte Carlo (MCMC) algorithms can be time-consuming and often infeasible in practice. To address this issue, we propose a faster version of BKMR using supervised random Fourier features to approximate the Gaussian process. We use periodic functions as basis functions and this approximation re-frames the kernel machine regression into a linear mixed-effect model that facilitates computationally efficient estimation and prediction. Bayesian inference was conducted using MCMC with Hamiltonian Monte Carlo algorithms. Analytic code for implementing Fast BKMR was developed for R. Simulation studies demonstrated that this approximation method yields results comparable to the original Gaussian process while reducing the computation time by 29 to 99%, depending on the number of basis functions and sample sizes. Our approach is also more robust to kernel misspecification in some scenarios. Finally, we applied this approach to analyze over 270,000 birth records, examining associations between multiple ambient air pollutants and birthweight in Georgia.
翻译:环境流行病学传统上侧重于考察单一暴露的健康效应,近期则更多地调整了共现暴露的影响。暴露评估与统计工具的进步使得研究转向关注多种暴露及其综合健康影响成为可能。贝叶斯核机器回归(BKMR)是一种流行的方法,用于灵活估计多种暴露的联合与非线效应。然而,BKMR在处理大型数据集时面临计算挑战,因为在马尔可夫链蒙特卡洛(MCMC)算法中反复求逆核矩阵可能耗时且在实际中往往不可行。为解决此问题,我们提出了一种更快速的BKMR版本,利用监督式随机傅里叶特征来近似高斯过程。我们使用周期函数作为基函数,该近似方法将核机器回归重新表述为一个线性混合效应模型,从而促进了计算高效的估计与预测。贝叶斯推断是使用结合哈密顿蒙特卡洛算法的MCMC进行的。我们为R语言开发了实现快速BKMR的分析代码。模拟研究表明,该近似方法所得结果与原始高斯过程相当,同时根据基函数数量和样本大小的不同,计算时间减少了29%至99%。在某些情况下,我们的方法对核函数误设也更具鲁棒性。最后,我们应用该方法分析了超过27万条出生记录,以考察佐治亚州多种环境空气污染物与出生体重之间的关联。