Distributed Lag Models (DLMs) and similar regression approaches such as MIDAS have been used for many decades in econometrics, and more recently in the study of air quality and its impact on human health. They are useful not only for quantifying accumulating and delayed effects, but also for estimating the lags that are most susceptible to these effects. Among other things, they have been used to infer the period of exposure to poor air quality which might negatively impact child birth weight. The increased attention DLMs have received in recent years is reflective of their potential to help us understand a great many issues, particularly in the investigation of how the environment affects human health. In this paper we describe how to expand the utility of these models for Bayesian inference by leveraging latent-variables. In particular we explain how to perform binary regression to better handle imbalanced data, how to incorporate negative binomial regression, and how to estimate the probability of predictor inclusion. Extra parameters introduced through the DLM framework may require calibration for the MCMC algorithm, but this will not be the case in DLM-based analyses often seen in pollution exposure literature. In these cases, the parameters are inferred through a fully automatic Gibbs sampling procedure.
翻译:分布滞后模型(DLM)及类似回归方法(如MIDAS)已在计量经济学领域应用数十年,近年来更在空气质量及其对人类健康影响的研究中得到广泛使用。这类模型不仅可用于量化累积效应和滞后效应,还能估计最易受这些效应影响的滞后阶数。例如,研究者曾利用该模型推断可能对新生儿出生体重产生负面影响的不良空气质量暴露时段。DLM近年来受到的广泛关注,反映了其在理解众多问题(尤其是探究环境如何影响人类健康)方面的巨大潜力。本文描述如何通过潜变量扩展这些模型在贝叶斯推断中的适用性,具体包括:如何执行二值回归以更好处理不平衡数据,如何整合负二项回归,以及如何估计预测变量包含概率。通过DLM框架引入的额外参数可能需要为MCMC算法进行校准,但在污染暴露文献中常见的基于DLM的分析中则无需如此。在这些应用场景中,参数通过全自动吉布斯采样过程完成推断。