Childhood asthma is a common illness exacerbated by air pollution as well as meteorological and neighborhood-level socioeconomic factors. Modeling asthma exacerbation (AE) in large spatiotemporal datasets requires disentangling impacts from multiple contributors. In this case study, we compared three techniques that balance predictive power with interpretability to predict AE in Hampton Roads, a coastal Virginia region comprising 7 cities and over 1.5 million people. After collating ambient air pollution measurements, weather data, and measures of neighborhood opportunity, we modeled zip code-level acute AE visits to a regional children's hospital and affiliated providers from 2018-2023. Generalized linear models (GLM) provided a baseline while neural networks (NN) served as a maximally predictive target. To bridge between statistical models and deep learning, we developed a framework based on sparse dictionary learning to identify and interpret parsimonious nonlinear interacting equations. After comparing each model's predictive performance, we estimated relative risks for AE due to input exposure variables and found consensus across frameworks. Our work links statistical and interpretable machine learning models to highlight possible synergistic interactions influencing AE, and may enable future studies to guide public health interventions in coastal Virginia.
翻译:儿童期哮喘是一种常见疾病,空气污染、气象条件及社区层面的社会经济因素均会加剧其发作。在大型时空数据集中对哮喘急性发作建模需厘清多重因素的复杂影响。本案例研究中,我们比较了三种兼顾预测能力与可解释性的技术,用于预测汉普顿锚地(弗吉尼亚沿海地区,涵盖7个城市、人口逾150万)的哮喘急性发作。在整合环境空气污染测量数据、气象数据及社区机会指标后,我们建立了2018-2023年间某区域儿童医院及附属医疗机构按邮政编码区划统计的急性哮喘就诊模型。广义线性模型作为基准方法,神经网络则作为预测性能最大化的目标。为桥接统计模型与深度学习,我们构建了基于稀疏字典学习的框架,用于识别并解析简约的非线性交互方程。在比较各模型预测性能后,我们估算了输入暴露变量对哮喘急性发作的相对风险,并发现各框架间具有一致性。本研究将统计模型与可解释机器学习模型相联结,揭示了影响哮喘急性发作的可能协同交互作用,可为未来指导弗吉尼亚沿海地区的公共卫生干预措施提供依据。