Diet plays a crucial role in health, and understanding the causal effects of dietary patterns is essential for informing public health policy and personalized nutrition strategies. However, causal inference in nutritional epidemiology faces several challenges: (i) high-dimensional and correlated food/nutrient intake data induce massive treatment levels; (ii) nutritional studies are interested in latent dietary patterns rather than single food items; and (iii) the goal is to estimate heterogeneous causal effects of these dietary patterns on health outcomes. We address these challenges by introducing a sophisticated exposure mapping framework that reduces the high-dimensional treatment space via factor analysis and enables the identification of dietary patterns. We also extend the Bayesian Causal Forest to accommodate three ordered levels of dietary exposure, better capturing the complex structure of nutritional data and enabling estimation of heterogeneous causal effects. We evaluate the proposed method through extensive simulations and apply it to a multi-center epidemiological study of Hispanic/Latino adults residing in the US. Using high-dimensional dietary data, we identify six dietary patterns and estimate their causal link with two key health risk factors: body mass index and fasting insulin levels. Our findings suggest that higher consumption of plant lipid-antioxidant, plant-based, animal protein, and dairy product patterns is associated with reduced risk.
翻译:饮食对健康具有关键影响,理解膳食模式的因果效应对于制定公共卫生政策与个性化营养策略至关重要。然而,营养流行病学中的因果推断面临多重挑战:(i) 高维且相关的食物/营养素摄入数据导致处理水平数量巨大;(ii) 营养研究关注的是潜在的膳食模式而非单一食物项目;(iii) 研究目标在于估计这些膳食模式对健康结果的异质性因果效应。我们通过引入一种精密的暴露映射框架应对这些挑战,该框架通过因子分析降低高维处理空间并实现膳食模式的识别。我们进一步扩展了贝叶斯因果森林模型,使其能够适应膳食暴露的三个有序等级,从而更好地捕捉营养数据的复杂结构并实现异质性因果效应的估计。我们通过大量模拟实验评估所提出的方法,并将其应用于一项针对居住在美国的西班牙裔/拉丁裔成年人的多中心流行病学研究。利用高维膳食数据,我们识别出六种膳食模式,并估计了它们与两个关键健康风险因素——身体质量指数和空腹胰岛素水平——的因果关系。研究结果表明,植物脂质抗氧化模式、植物性膳食模式、动物蛋白模式及乳制品模式摄入量的增加与风险降低相关。