The impact of wildfire smoke on air quality is a growing concern, contributing to air pollution through a complex mixture of chemical species with important implications for public health. While previous studies have primarily focused on its association with total particulate matter (PM2.5), the causal relationship between wildfire smoke and the chemical composition of PM2.5 remains largely unexplored. Exposure to these chemical mixtures plays a critical role in shaping public health, yet capturing their relationships requires advanced statistical methods capable of modeling the complex dependencies among chemical species. To fill this gap, we propose a Bayesian causal regression factor model that estimates the multivariate causal effects of wildfire smoke on the concentration of 27 chemical species in PM2.5 across the United States. Our approach introduces two key innovations: (i) a causal inference framework for multivariate potential outcomes, and (ii) a novel Bayesian factor model that employs a probit stick-breaking process as prior for treatment-specific factor scores. By focusing on factor scores, our method addresses the missing data challenge common in causal inference and enables a flexible, data-driven characterization of the latent factor structure, which is crucial to capture the complex correlation among multivariate outcomes. Through Monte Carlo simulations, we show the model's accuracy in estimating the causal effects in multivariate outcomes and characterizing the treatment-specific latent structure. Finally, we apply our method to US air quality data, estimating the causal effect of wildfire smoke on 27 chemical species in PM2.5, providing a deeper understanding of their interdependencies.
翻译:野火烟雾对空气质量的影响日益受到关注,其通过复杂的化学物种混合物加剧空气污染,对公共健康产生重要影响。以往研究主要关注其与细颗粒物(PM2.5)总量的关联,但野火烟雾与PM2.5化学组成之间的因果关系尚未得到充分探索。暴露于这些化学混合物对公共健康具有关键影响,然而捕捉其相互关系需要能够模拟化学物种间复杂依赖性的先进统计方法。为填补这一空白,我们提出一种贝叶斯因果回归因子模型,用于估计野火烟雾对美国境内PM2.5中27种化学物种浓度的多元因果效应。我们的方法引入两项关键创新:(i)针对多元潜在结果的因果推断框架;(ii)采用概率单位棒折断过程作为处理特定因子得分先验的新型贝叶斯因子模型。通过聚焦于因子得分,我们的方法解决了因果推断中常见的缺失数据挑战,并实现了对潜在因子结构的灵活、数据驱动表征,这对捕捉多元结果间的复杂相关性至关重要。通过蒙特卡洛模拟,我们证明了该模型在估计多元结果因果效应及表征处理特定潜在结构方面的准确性。最后,我们将该方法应用于美国空气质量数据,估算了野火烟雾对PM2.5中27种化学物种的因果效应,从而深化了对其相互依赖关系的理解。